53 datasets found

Machine learning software market share worldwide 2021
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Machine learning software market share worldwide 2021 [Dataset]. https://www.statista.com/statistics/1258541/machine-learning-market-share-technology-worldwide/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
Worldwide
Description
Newsle led the global machine learning industry in 2021 with a market share of ***** percent, followed by TensorFlow and Torch. The source indicates that machine learning software is utilized for the application of artificial intelligence (AI) that allows systems the ability to automatically or "artificially" learn and improve functions based on experience without being specifically programmed to do so.
Use case frequency of machine learning and artificial intelligence 2020-2021...
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Use case frequency of machine learning and artificial intelligence 2020-2021 [Dataset]. https://www.statista.com/statistics/1111204/machine-learning-use-case-frequency/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2020
Area covered
Worldwide
Description
In 2021, with ** percent, improving customer experience represents the top artificial intelligence and machine learning use cases. The deployment of machine learning and artificial intelligence can advance a variety of business processes.
State of AI/machine learning in hospitals in the U.S. in 2021
statista.com
Updated Jun 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). State of AI/machine learning in hospitals in the U.S. in 2021 [Dataset]. https://www.statista.com/statistics/1249788/ai-machine-learning-in-hospitals-in-the-us/
Explore at:
Dataset updated
Jun 23, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2021 - Apr 2021
Area covered
United States
Description
According to a survey conducted among healthcare providers in the United States in April 2021, ** percent of respondents reported that in their hospital or health systems artificial intelligence (AI)/machine learning efforts were in the pilot stage and the rollout was to be decided, while a further ** percent said that it is in the early stage initiatives.
o
Data from: Covid-19 and AI: Unexpected Challenges and Lessons
explore.openaire.eu
Updated Jan 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Guedj (2021). Covid-19 and AI: Unexpected Challenges and Lessons [Dataset]. https://explore.openaire.eu/search/other?orpId=od_165::81d628d490820c4e7b17a3f0eeca78d6
Explore at:
Dataset updated
Jan 1, 2021
Authors
Benjamin Guedj
Description
On May 21st, 2021, we held the webinar "Covid-19 and AI: unexpected challenges and lessons". This short note presents its highlights.
Machine learning challenges in companies 2018-2021
statista.com
Updated Apr 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Machine learning challenges in companies 2018-2021 [Dataset]. https://www.statista.com/statistics/1111249/machine-learning-challenges/
Explore at:
Dataset updated
Apr 6, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2020
Area covered
Worldwide
Description
According to a recent survey, 56 percent of respondents state experiencing issues with security and auditability requirements when deploying machine learning and artificial intelligence in 2021. Auditability is the degree to which transaction from the originator to the approver and final disposition can be traced.
AI/machine learning medical device market worldwide 2021-2032
statista.com
ai-chatbox.pro
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). AI/machine learning medical device market worldwide 2021-2032 [Dataset]. https://www.statista.com/statistics/1419774/ai-machine-learning-medical-device-market/
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
Worldwide
Description
In 2021, the AI and machine learning medical device market was valued at around *** billion U.S. dollars globally. By 2032, the market was forecast to increase to a value of **** billion U.S. dollars.
California STD Statistics (2001-2021)
kaggle.com
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mir Tahmid (2024). California STD Statistics (2001-2021) [Dataset]. https://www.kaggle.com/datasets/tahmidmir/stds-in-california
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2024
Dataset provided by
Kaggle
Authors
Mir Tahmid
Area covered
California
Description
The dataset "California STD Statistics (2001-2021).csv" contains information about reported cases of sexually transmitted diseases (STDs) (chlamydia, gonorrhea, and early syphilis, which includes primary, secondary, and early latent syphilis) across different counties in the United States from the year 2001 to 2021. The data includes details on the number of cases, population estimates, and calculated rates of infection. It is segmented by disease type, county, year, and sex, providing a comprehensive overview of STD prevalence and trends over a 20-year period.

Column Descriptions

Disease: The type of sexually transmitted disease (e.g., Chlamydia, Gonorrhea).

County: The name of the county where the data was collected.

Year: The year when the data was recorded.

Sex: The sex of the population (Female, Male, Total).

Cases: The number of reported cases of the disease.

Population: The estimated population of the county for the given year and sex.

Rate: The rate of infection per 100,000 people.

Lower 95% CI: The lower bound of the 95% confidence interval for the rate.

Upper 95% CI: The upper bound of the 95% confidence interval for the rate.

Annotation Code: Additional annotation codes that are sparsely populated.

Acknowledgement: All rights reserved by CalHHS

https://data.chhs.ca.gov/pages/terms

Usage: CalHHS Open Data Portal Terms of Use

License: CalHHS reserves all rights and terms to use this data

you will find it here on those links

https://data.chhs.ca.gov/pages/terms

https://data.chhs.ca.gov/dataset/stds-in-california-by-disease-county-year-and-sex

LAST MODIFIED: June 4, 2024.
F
AI in HIV/AIDS Market By Tools (Machine Learning, Natural Language...
fnfresearch.com
pdf
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facts and Factors (2025). AI in HIV/AIDS Market By Tools (Machine Learning, Natural Language Processing, Image Processing, Speech Recognition, And Others), By Data (Patient Data, Insurance Information, And Hospital Stay), By Types Of Algorithms (Artificial Neural Networks, Deep Learning, And Others): Global and Regional Industry Perspective, Comprehensive Analysis, and Forecast 2021 – 2026 [Dataset]. https://www.fnfresearch.com/ai-in-hivaids-market
Explore at:
pdfAvailable download formats
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Facts and Factors
License
https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
[197+ Pages Report] Global AI in HIV/AIDS market size & share expected to reach revenue of USD 400.7 Million by 2026, with a CAGR of 8.9% during the projected period. AI has been transforming the landscape of technology breakthroughs with its impact being felt across several sectors.
Data archive for paper "Copula-based synthetic data augmentation for...
zenodo.org
zip
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Meyer; David Meyer (2022). Data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators" [Dataset]. http://doi.org/10.5281/zenodo.5150327
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5150327
Dataset updated
Mar 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Meyer; David Meyer
Description
Overview

This is the data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators". It contains the paper’s data archive with model outputs (see results folder) and the Singularity image for (optionally) re-running experiments.

For the Python tool used to generate synthetic data, please refer to Synthia.

Requirements

Singularity >= 3

Portable Batch System (PBS) job scheduler*

Today's high-performance computer (e.g. ~ 32 CPUs @ 2 500 MHz with 64 GB of RAM )

*Although PBS in not a strict requirement, it is required to run all helper scripts as included in this repository. Please note that depending on your specific system settings and resource availability, you may need to modify PBS parameters at the top of submit scripts stored in the hpc directory (e.g. #PBS -lwalltime=72:00:00).

Usage

To reproduce the results from the experiments described in the paper, first fit all copula models to the reduced NWP-SAF dataset with:

qsub hpc/fit.sh

then, to generate synthetic data, run all machine learning model configurations, and compute the relevant statistics use:

qsub hpc/stats.sh qsub hpc/ml_control.sh qsub hpc/ml_synth.sh

Finally, to plot all artifacts included in the paper use:

qsub hpc/plot.sh

Licence

Code released under MIT license. Data from the reduced NWP-SAF dataset released under CC BY 4.0.
Types of data used by ML, DS, and AI developers worldwide 2021
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Types of data used by ML, DS, and AI developers worldwide 2021 [Dataset]. https://www.statista.com/statistics/1241924/worldwide-software-developer-data-uses/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2020 - Feb 2021
Area covered
Worldwide
Description
According to the survey, ** percent of machine learning, data science, and artificial intelligence developers work with unstructured text data, which makes it the most popular type of data for developers. Tabular data is the second most popular type of data, with ** percent usage.
NeurIPS 2021 - submission statistics
kaggle.com
Updated Nov 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konrad Banachewicz (2021). NeurIPS 2021 - submission statistics [Dataset]. https://www.kaggle.com/konradb/neurips-2021-submission-statistics/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Konrad Banachewicz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Scraped data via https://www.convertcsv.com/html-table-to-csv.htm, converted to .csv by me. Original reddit post: https://www.reddit.com/r/MachineLearning/comments/qzjuvk/discussion_neurips_2021_finally_accepted/
m
Ultimate_Analysis
data.mendeley.com
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akara Kijkarncharoensin (2022). Ultimate_Analysis [Dataset]. http://doi.org/10.17632/t8x96g88p3.2
Explore at:
Unique identifier
https://doi.org/10.17632/t8x96g88p3.2
Dataset updated
Jan 28, 2022
Authors
Akara Kijkarncharoensin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.

An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.
D
Notable AI Models
epoch.ai
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Epoch AI, Notable AI Models [Dataset]. https://epoch.ai/data/notable-ai-models
Explore at:
csvAvailable download formats
Dataset authored and provided by
Epoch AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Global
Variables measured
https://epoch.ai/data/notable-ai-models-documentation#records
Measurement technique
https://epoch.ai/data/notable-ai-models-documentation#records
Description
Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.
Artificial Intelligence Market By Component (Hardware, Software, &...
fnfresearch.com
pdf
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facts and Factors (2025). Artificial Intelligence Market By Component (Hardware, Software, & Services), By Technology (Deep Learning, Machine Learning, Natural Language Processing, & Machine Vision), By Deployment (Cloud-Based& On-Premises), By End-Verticals (Healthcare, Retail, BSFI, Automotive, Advertising and Media, Manufacturing, Agricultural, & Others), And By Regions - Global Industry Perspective, COVID-19 Impact Analysis, Market Valuation, Business Strategies, Comprehensive Study, Latest Trends & Forecast 2021 - 2026 [Dataset]. https://www.fnfresearch.com/artificial-intelligence-market
Explore at:
pdfAvailable download formats
Dataset updated
Jun 21, 2025
Dataset provided by
Authors
Facts and Factors
License
https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
Global
Description
[219+ Pages Report] Global artificial intelligence market size & share projected a value of USD 299.64 Billionby 2026, and is growing at a CAGR value of 35.6% during 2021-2026.
m
Proximate_Analysis
data.mendeley.com
Updated Jan 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akara Kijkarncharoensin (2022). Proximate_Analysis [Dataset]. http://doi.org/10.17632/g36dhg826s.2
Explore at:
Unique identifier
https://doi.org/10.17632/g36dhg826s.2
Dataset updated
Jan 25, 2022
Authors
Akara Kijkarncharoensin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This database studies the performance inconsistency on the biomass HHV proximate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These models consist of eight regressions, four supervised learnings, and three neural networks.

An excel workbook, "BiomassDataSetProximate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Proximate," contains 803 HHV data from 17 pieces of literature. The names of the worksheet column indicate the elements of the proximate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

A file named "SourceCodeProximate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, "runStudyProximate.m," is the article's main program (Kijkarncharoensin & Innet, 2021) to analyze the performance consistency of the biomass HHV model through the proximate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

Reference : Kijkarncharoensin, A., & Innet, S. (2021). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Proximate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.
f
Available NUTS3 areas by country.
plos.figshare.com
xls
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos (2023). Available NUTS3 areas by country. [Dataset]. http://doi.org/10.1371/journal.pone.0287063.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0287063.t001
Dataset updated
Oct 13, 2023
Dataset provided by
PLOS ONE
Authors
Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The publication of tourism statistics often does not keep up with the highly dynamic tourism demand trends, especially critical during crises. Alternative data sources such as digital traces and web searches represent an important source to potentially fill this gap, since they are generally timely, and available at detailed spatial scale. In this study we explore the potential of human mobility data from the Google Community Mobility Reports to nowcast the number of monthly nights spent at sub-national scale across 11 European countries in 2020, 2021, and the first half of 2022. Using a machine learning implementation, we found that this novel data source is able to predict the tourism demand with high accuracy, and we compare its potential in the tourism domain to web search and mobile phone data. This result paves the way for a more frequent and timely production of tourism statistics by researchers and statistical entities, and their usage to support tourism monitoring and management, although privacy and surveillance concerns still hinder an actual data innovation transition.
f
Absolute and relative estimation errors and relative model errors for each...
plos.figshare.com
xls
Updated Oct 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos (2023). Absolute and relative estimation errors and relative model errors for each country in year 2020, 2021, and 2022. [Dataset]. http://doi.org/10.1371/journal.pone.0287063.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0287063.t002
Dataset updated
Oct 13, 2023
Dataset provided by
PLOS ONE
Authors
Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The first column shows the available countries with ISO 3166–1 alpha-2 country codes (https://www.iso.org/iso-3166-country-codes.html. Last accessed the 2022/05/16).
t
Data for A method for assessment of the general circulation model quality...
data.taltech.ee
data.niaid.nih.gov
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilja Maljutenko; Ilja Maljutenko; Urmas Raudsepp; Urmas Raudsepp (2025). Data for A method for assessment of the general circulation model quality using k-means clustering algorithm [Dataset]. http://doi.org/10.5281/zenodo.4588510
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4588510
Dataset updated
Mar 11, 2025
Dataset provided by
TalTech Data Repository
Authors
Ilja Maljutenko; Ilja Maljutenko; Urmas Raudsepp; Urmas Raudsepp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2021
Description
The dataset consists of simulated and observed salinity/temperature data which were used in the manuscript "A method for assessment of the general circulation model quality using k-means clustering algorithm" submitted to Geoscientific Model Development.
The model simulation dataset is from long-term 3D circulation model simulation (Maljutenko and Raudsepp 2014, 2019). The observations are from the "Baltic Sea - Eutrophication and Acidity aggregated datasets 1902/2017 v2018" SMHI (2018).

The files are in simple comma separated table format without headers.
The Dout-t_z_lat_lon_Smod_Sobs_Tmod_Tobs.csv file contains columns with following variables [units]:
Time [matlab datenum units], Vertical coordinate [m], latitude [oN], longitude [oE], model salinity [g/kg], observed salinity [g/kg], model temperature [oC], observed temperature [oC].
The Dout-t_z_lat_lon_dS_dT_K1_K2_K3_K4_K5_K6_K7_K8_K9.csv file contains columns with following variables [units]:
4 first columns are the same as in the previous file, salinity error [g/kg], temperature error [oC], columns 7-8 are integers showing the cluster to which the error pair is designated.
do_clust_valid_DataFig.m is a Matlab script which reads the two csv files (and optionally mask file Model_mask.mat), performs the clustering analysis and creates plots which are used in Manuscript. The script is organized into %% blocks which can be executed separately (default: ctrl+enter).
k-means function is used from the Matlab Statistics and Machine Learning Toolbox.
Additional software used in the do_clust_valid_DataFig.m:
Author's auxiliary formatting scripts script/
datetick_cst.m
do_fitfig.m
do_skipticks.m
do_skipticks_y.m
Colormaps are generated using cbrewer.m (Charles, 2021).
Moving average smoothing is performed using nanmoving_average.m (Aguilera, 2021).
Dataset and code of groudwater nitrate for machine learning
zenodo.org
bin
Updated Apr 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xuan Guo; Xuan Guo (2024). Dataset and code of groudwater nitrate for machine learning [Dataset]. http://doi.org/10.5281/zenodo.10974026
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10974026
Dataset updated
Apr 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Xuan Guo; Xuan Guo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data of groudwater nitrate and related data in North China Plain (NCP). The data including nitrate concentration of groudwater collected from more than 4,000 sites (wells) in NCP from 2005 to 2021. The groundwater samples were collected in 2005–2021, and the collection was conducted in May (before rainy season) and October (after rainy season) in each year for every site.During sampling, basic information about well location, groundwater depth, farmland planting pattern and soil types were collected. Sampling wells were divided into three types according to depth, shallow (0–30 m), medium (30–100 m) and deep (> 100 m). The planting pattern mainly involved intensive croplands, grain crops, vegetable crops and orchards. Soil types of each sampling site were obtained from the China soil database (http://vdb3.soil.csdb.cn/).The socio-economic and agricultural information of the study areas (take the districts of municipalities and prefecture-level cities of provinces as basic units) were acquired via the China Statistical Yearbook (http://www.stats.gov.cn/sj/ndsj/). The data includes agricultural planting area, grain crop area, vegetable planting area, orchard planting area, total facility agricultural area; fertilizer amount, nitrogen fertilizer amount, unit area nitrogen fertilizer amount; total output value of agricultural, forestry, animal and fishery husbandry, agricultural output value, forestry output value, animal husbandry output value, fishery output value; Gross Domestic Product (GDP), per capita GDP; total population, and rural population.
m
Data from: The Economic Bomb: A Strategic Financial Warfare Tactic
data.mendeley.com
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolin Decker (2025). The Economic Bomb: A Strategic Financial Warfare Tactic [Dataset]. http://doi.org/10.17632/xn9ws8x6j7.2
Explore at:
Unique identifier
https://doi.org/10.17632/xn9ws8x6j7.2
Dataset updated
Feb 21, 2025
Authors
Nicolin Decker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides evidence supporting the hypothesis that institutional shorting, ETF outflows, whale wallet movements, and media sentiment drive Bitcoin’s volatility and price manipulation. Central to this dataset is the Decker Sentiment-Short Interest Model (DSSIM)—an original equation developed by Nicolin Decker to quantify the relationship between market sentiment and institutional short interest. By combining sentiment scores from Natural Language Processing (NLP) and short positioning data, DSSIM offers a flexible framework for analyzing volatility in Bitcoin and other assets.

The dataset spans January 2021 to December 2024, capturing daily market activity and key price events. Each file aligns with DSSIM’s variables, enabling replication and further analysis of the findings in the doctoral-level thesis The Economic Bomb: A Strategic Financial Warfare Tactic.

Key Components: BTC_Price_Data.csv: Daily BTC/USD closing prices from Binance, Coinbase, and Bitstamp, serving as the baseline for volatility and return calculations.

ETF_Holdings_Over_Time_Thesis.csv: Daily BTC holdings of ETFs (Grayscale, BlackRock, and Fidelity), illustrating cumulative outflows and their liquidity impact.

ETF_Outflows_Price_Impact_Data.csv: Correlates ETF outflows with BTC volatility, highlighting timing and magnitude.

Institutional_Shorting_Data.csv: Daily BTC short positions from Binance, BitMEX, Bybit, and OKX, serving as input for DSSIM’s short interest variable.

Whale_Wallet_Movements.csv: Tracks large BTC wallet movements, revealing sell-offs preceding price crashes and influencing DSSIM’s residual noise component.

Market_Liquidity_Data.csv: Daily BTC trading volume, order book depth, and liquidity ratios, validating DSSIM’s predictive capabilities.

Media_Sentiment_Scores.csv: Daily sentiment from Twitter, Reddit, Google News, and YouTube, forming DSSIM’s sentiment variable.

Monte_Carlo_Simulation_Results.csv: Simulates 1,000 BTC price paths to assess potential volatility under market stress.

VAR_Model_Data.csv: Analyzes ETF outflows’ delayed impact on BTC returns using vector autoregression.

Volatility_Clustering_Data.csv: Tracks daily BTC returns and 30-day rolling volatility, confirming persistent volatility after institutional actions.

GARCH_Model_Data.csv: Models BTC volatility using GARCH, validating volatility clustering during market shocks.

The dataset includes adjustments for major market events, such as the May 2021 Flash Crash, June 2022 Liquidation Crisis, and March 2023 Banking Crisis, ensuring realistic volatility patterns aligned with DSSIM’s modeling of sentiment shifts and institutional shorting.

Researchers can use DSSIM’s structure and data to explore similar dynamics in other cryptocurrencies, equities, commodities, and forex markets, advancing financial analysis and predictive modeling.

Access the full dataset: https://drive.google.com/drive/folders/1pnwqBTMF_QSJoC5QcNAPSQpVtOST2n8c?usp=drive_link

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Machine learning software market share worldwide 2021 [Dataset]. https://www.statista.com/statistics/1258541/machine-learning-market-share-technology-worldwide/

Machine learning software market share worldwide 2021

Explore at:

Dataset updated

Jul 1, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2021

Area covered

Worldwide

Description

Newsle led the global machine learning industry in 2021 with a market share of ***** percent, followed by TensorFlow and Torch. The source indicates that machine learning software is utilized for the application of artificial intelligence (AI) that allows systems the ability to automatically or "artificially" learn and improve functions based on experience without being specifically programmed to do so.

Clear search

Close search

Google apps

Main menu

Machine learning software market share worldwide 2021

Use case frequency of machine learning and artificial intelligence 2020-2021...

State of AI/machine learning in hospitals in the U.S. in 2021

Data from: Covid-19 and AI: Unexpected Challenges and Lessons

Machine learning challenges in companies 2018-2021

AI/machine learning medical device market worldwide 2021-2032

California STD Statistics (2001-2021)

AI in HIV/AIDS Market By Tools (Machine Learning, Natural Language...

Data archive for paper "Copula-based synthetic data augmentation for...

Types of data used by ML, DS, and AI developers worldwide 2021

NeurIPS 2021 - submission statistics

Ultimate_Analysis

Notable AI Models

Artificial Intelligence Market By Component (Hardware, Software, &...

Proximate_Analysis

Available NUTS3 areas by country.

Absolute and relative estimation errors and relative model errors for each...

Data for A method for assessment of the general circulation model quality...

Dataset and code of groudwater nitrate for machine learning

Data from: The Economic Bomb: A Strategic Financial Warfare Tactic

Machine learning software market share worldwide 2021