100+ datasets found

c
Cirrhosis Prediction Dataset
cubig.ai
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Cirrhosis Prediction Dataset [Dataset]. https://cubig.ai/store/products/211/cirrhosis-prediction-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Cirrhosis Prediction dataset is intended for the advancement of machine learning models to predict the stage of liver cirrhosis. It contains various clinical features, which are vital for prognosis and treatment strategies.

2) Data Utilization (1) Cirrhosis Prediction data has characteristics that: • It includes clinical data like liver biochemistry, demographic details, and histology grading. • The dataset aids in developing predictive models for staging liver cirrhosis, potentially improving patient outcomes. (2) Cirrhosis Prediction data can be used to: • Medical Research: It is used in developing algorithms for early detection and progression tracking of liver cirrhosis. • Healthcare Strategy: Assists in forming medical interventions and managing treatment plans for patients.
Data from: Diabetes Risk Prediction
kaggle.com
zip
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu (2024). Diabetes Risk Prediction [Dataset]. https://www.kaggle.com/datasets/rcratos/diabetes-risk-prediction
Explore at:
zip(2603 bytes)Available download formats
Dataset updated
Apr 2, 2024
Authors
Himanshu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Age: This represents the age of the individual in years.

Gender: This is the gender of the individual. It can be Male or Female.

Polyuria: This refers to the presence of excessive urination, which is a common symptom of diabetes.

Polydipsia: This refers to excessive thirst, another common symptom of diabetes.

Sudden weight loss: This indicates whether the individual has experienced unexplained weight loss, which can be a sign of diabetes.

Weakness: This indicates whether the individual experiences general physical weakness, a potential symptom of diabetes.

Polyphagia: This refers to excessive hunger, another potential symptom of diabetes.

Genital thrush: This is a yeast infection that can cause itching, soreness, and other discomforts in the genital area. It can be more common in people with diabetes.

Visual blurring: This indicates whether the individual experiences blurred vision, a potential symptom of diabetes.

Itching: This indicates whether the individual experiences general itching, which can be a symptom of diabetes.

Irritability: This indicates whether the individual experiences irritability, which can be a symptom of diabetes.

Delayed healing: This indicates whether the individual experiences slow healing of wounds, which can be a symptom of diabetes.

Partial paresis: This refers to a partial loss of voluntary movement, which can be a symptom of diabetes.

Muscle stiffness: This indicates whether the individual experiences muscle stiffness, which can be a symptom of diabetes.

Alopecia: This refers to hair loss, which can be a symptom of diabetes.

Obesity: This indicates whether the individual is obese, which is a major risk factor for diabetes.
Lean Body Mass Prediction Challenge
kaggle.com
zip
Updated Nov 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2025). Lean Body Mass Prediction Challenge [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/lean-body-mass-prediction-challenge
Explore at:
zip(777622 bytes)Available download formats
Dataset updated
Nov 19, 2025
Authors
Gaurav Dutta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Challenge Details: In this data-driven hackathon, participants will develop machine learning models to predict the lean_body_mass based on Lean Body Mass Data.

Submission and Evaluation Submission Format: Participants must submit their predictions in the format specified in submission.csv. Evaluation Metric: Submissions will be evaluated based on the R2_Score , measuring how well the model predict the lean_body_mass.
c
Patient Survival Prediction Dataset
cubig.ai
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Patient Survival Prediction Dataset [Dataset]. https://cubig.ai/store/products/206/patient-survival-prediction-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data introduction •Patient dataset aimed to develop and validate a prediction model for all-cause in-hospital mortality in hospitalized patients.

2) Data utilization (1) Patient data has characteristics that: • This is a csv file consisting of 85 columns consisting of variables such as age, BMI, and ethnicity. Based on these factors, the patient's survival is predicted. (2) Patient data can be used to: • Personalized medicine: Insights gained from data can support the development of personalized treatment plans that tailor interventions to individual patient needs based on predicted survival probabilities. • Healthcare Management: Data can help manage healthcare by predicting patient outcomes, planning for future healthcare needs, and improving overall patient care strategies.
c
Orbit Classification For Prediction / NASA Dataset
cubig.ai
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Orbit Classification For Prediction / NASA Dataset [Dataset]. https://cubig.ai/store/products/167/orbit-classification-for-prediction-nasa-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Orbit Classification For Prediction Dataset focuses on predicting the classes of orbits for celestial objects. This dataset includes parameters such as semi-major axis, eccentricity, inclination, argument of perihelion, and more. It provides a comprehensive overview for orbit classification and prediction.

2) Data Utilization (1) Orbit data has characteristics that: • It allows for detailed analysis and classification of orbits based on several orbital parameters, aiding in the prediction and understanding of celestial objects' orbits. (2) Orbit data can be used to: • Astronomy and Space Research: Useful for astronomers and researchers to classify and predict the orbits of celestial objects, aiding in space exploration and study. • Educational Purposes: Assists in academic studies related to celestial mechanics and orbital dynamics. • Technology Development: Supports the development of algorithms and AI models for orbit prediction and classification.
BPM Prediction Challenge
kaggle.com
zip
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2025). BPM Prediction Challenge [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/bpm-prediction-challenge/versions/1
Explore at:
zip(751765 bytes)Available download formats
Dataset updated
Aug 7, 2025
Authors
Gaurav Dutta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📣 Challenge Details: In this data-driven hackathon, participants will develop machine learning models to predict the BeatsPerMinute based on Music Track BPM Data.
Data_Sheet_1_Development of a deep learning model for predicting recurrence...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seung Hyoung Ko; Jie Cao; Yong-kang Yang; Zhi-feng Xi; Hyun Wook Han; Meng Sha; Qiang Xia (2024). Data_Sheet_1_Development of a deep learning model for predicting recurrence of hepatocellular carcinoma after liver transplantation.ZIP [Dataset]. http://doi.org/10.3389/fmed.2024.1373005.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2024.1373005.s001
Dataset updated
Jun 11, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Seung Hyoung Ko; Jie Cao; Yong-kang Yang; Zhi-feng Xi; Hyun Wook Han; Meng Sha; Qiang Xia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundLiver transplantation (LT) is one of the main curative treatments for hepatocellular carcinoma (HCC). Milan criteria has long been applied to candidate LT patients with HCC. However, the application of Milan criteria failed to precisely predict patients at risk of recurrence. As a result, we aimed to establish and validate a deep learning model comparing with Milan criteria and better guide post-LT treatment.MethodsA total of 356 HCC patients who received LT with complete follow-up data were evaluated. The entire cohort was randomly divided into training set (n = 286) and validation set (n = 70). Multi-layer-perceptron model provided by pycox library was first used to construct the recurrence prediction model. Then tabular neural network (TabNet) that combines elements of deep learning and tabular data processing techniques was utilized to compare with Milan criteria and verify the performance of the model we proposed.ResultsPatients with larger tumor size over 7 cm, poorer differentiation of tumor grade and multiple tumor numbers were first classified as high risk of recurrence. We trained a classification model with TabNet and our proposed model performed better than the Milan criteria in terms of accuracy (0.95 vs. 0.86, p < 0.05). In addition, our model showed better performance results with improved AUC, NRI and hazard ratio, proving the robustness of the model.ConclusionA prognostic model had been proposed based on the use of TabNet on various parameters from HCC patients. The model performed well in post-LT recurrence prediction and the identification of high-risk subgroups.
c
Campus Placement Prediction: Binary Classification Dataset
cubig.ai
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Campus Placement Prediction: Binary Classification Dataset [Dataset]. https://cubig.ai/store/products/168/campus-placement-prediction-binary-classification-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Campus Placement Prediction: Binary Classification dataset encapsulates a comprehensive array of attributes for predicting the outcome of candidate selection during campus deployment.

2) Data Utilization (1) Campus Placement Prediction: Binary Classification data has characteristics that: • The dataset includes various socioeconomic factors such as serial numbers, gender, secondary and higher education, university education, jobs, employability, etc. (2) Campus Placement Prediction: Binary Classification data can be used to: • Development of predictive models: useful for developing machine learning models that predict batch outcomes based on the attributes of a given candidate. • Characteristic Importance Analysis: Helps you determine which candidate properties have the greatest impact on the placement results.
Fuel Efficiency Prediction Challenge
kaggle.com
zip
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2025). Fuel Efficiency Prediction Challenge [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/fuel-efficiency-prediction-challenge
Explore at:
zip(166175 bytes)Available download formats
Dataset updated
Jun 12, 2025
Authors
Gaurav Dutta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📣 Challenge Details: Your goal is to Build a machine learning model to predict fuel_efficiency_kmpl using used cars data.

Data Description The dataset for this hackathon includes:

train.csv: Contains used cars data. test.csv: Contains data for testing. submission.csv: The format in which your predictions should be submitted.
Churn Prediction for Credit Card Customer
kaggle.com
zip
Updated Jun 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mukesh Manral (2022). Churn Prediction for Credit Card Customer [Dataset]. https://www.kaggle.com/datasets/mukeshmanral/churn-prediction-for-credit-card-customer
Explore at:
zip(151645 bytes)Available download formats
Dataset updated
Jun 12, 2022
Authors
Mukesh Manral
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Predict customer churn for credit card companny base on given features. You can use Machine Learning as well as Deep LEarning techniques to produce some meaningfull outputs. This dataset very basic and can be used for basic understanding.
Chesapeake Bay Nitrogen Trend Predictor Dataset
catalog.data.gov
s.cnmilf.com
Updated Jan 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Chesapeake Bay Nitrogen Trend Predictor Dataset [Dataset]. https://catalog.data.gov/dataset/chesapeake-bay-nitrogen-trend-predictor-dataset
Explore at:
Dataset updated
Jan 8, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Chesapeake Bay
Description
Please review Zhang et al. (2021) for details on study design and datasets (https://doi.org/10.1016/j.watres.2022.118443). In summary, predictor and response variable data was acquired from the Chesapeake Bay Program and USGS. This data was subjected to a trend analysis to estimate the MK linear slope change for both predictor and response variables. After running a cluster analysis on the scaled TN loading time series (the response variable), the cluster assignment was paired with the slope estimates from the suite of predictor variables tied to the nutrient inventory and static geologic and land use variables. From there, an RF analysis was executed to link trends in anthropogenic driver and other contextual environmental factors to the identified trend cluster types. After calibrating the RF model, likelihood of improving, relatively static, or degrading catchments across the Chesapeake Bay were identified for the 2007 to 2018 period. Tabular data is available on the journal website and PUBMED, and the predictor/response variable data can be downloaded individually in the USGS and Chesapeake Bay Program links listed in the data access section. Portions of this dataset are inaccessible because: This data was generate by other federal entities and are housed in their respective data warehouse domains (e.g., USGS and Chesapeake Bay Program). Furthermore, the data can be accessed on the journal website as well as NCBI PUBMED (https://pubmed.ncbi.nlm.nih.gov/35461100/). They can be accessed through the following means: Combined dataset can be accessed on the journal website (https://www.sciencedirect.com/science/article/pii/S0043135422003979?via%3Dihub#ack0001) and will soon be available on NCBI (https://pubmed.ncbi.nlm.nih.gov/35461100/). The predictor variable data can be accessed from the Chesapeake Bay Program (https://cast.chesapeakebay.net/) and USGS (https://pubs.er.usgs.gov/publication/ds948 and https://www.sciencebase.gov/catalog/item/5669a79ee4b08895842a1d47). Format: Please review Zhang et al. (2021) for details on study design and datasets (https://doi.org/10.1016/j.watres.2022.118443). In summary, predictor and response variable data was acquired from the Chesapeake Bay Program and USGS. This data was subjected to a trend analysis to estimate the MK linear slope change for both predictor and response variables. After running a cluster analysis on the scaled TN loading time series (the response variable), the cluster assignment was paired with the slope estimates from the suite of predictor variables tied to the nutrient inventory and static geologic and land use variables. From there, an RF analysis was executed to link trends in anthropogenic driver and other contextual environmental factors to the identified trend cluster types. After calibrating the RF model, likelihood of improving, relatively static, or degrading catchments across the Chesapeake Bay were identified for the 2007 to 2018 period. Tabular data is available on the journal website and PUBMED, and the predictor/response variable data can be downloaded individually in the USGS and Chesapeake Bay Program links listed in the data access section. This dataset is associated with the following publication: Zhang, Q., J. Bostic, and R. Sabo. Regional patterns and drivers of total nitrogen trends in the Chesapeake Bay watershed: Insights from machine learning approaches and management implications. WATER RESEARCH. Elsevier Science Ltd, New York, NY, USA, 218: 1-15, (2022).
Metatasks for Auto-Sklearn 1 - ROC AUC and Balanced Accuracy
figshare.com
bin
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lennart Purucker (2023). Metatasks for Auto-Sklearn 1 - ROC AUC and Balanced Accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.23613627.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23613627.v1
Dataset updated
Jul 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Lennart Purucker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Prediction Data of Base Models from Auto-Sklearn 1 on 71 classification datasets from the AutoML Benchmark for Balanced Accuracy and ROC AUC.

The files of this figshare item include data that was collected for the paper:

Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML, Lennart Purucker, Lennart Schneider, Marie Anastacio, Joeran Beel, Bernd Bischl, Holger Hoos, Second International Conference on Automated Machine Learning, 2023.

The data was stored and used with the assembled framework: https://github.com/ISG-Siegen/assembled.

In detail, the data contains the predictions of base models on validation and test as produced by running Auto-Sklearn 1 for 4 hours. Such prediction data is included for each model produced by Auto-Sklearn 1 on each fold of 10-fold cross-validation on the 71 classification datasets from the AutoML Benchmark. The data exists for two metrics (ROC AUC and Balanced Accuracy). More details can be found in the paper.

The data was collected by code created for the paper and is available in its reproducibility repository: https://doi.org/10.6084/m9.figshare.23613624.

Its usage is intended for but not limited to using assembled to evaluate post hoc ensembling methods for AutoML.

Details The link above points to a hosted server that facilitates the download. We opted for a hosted server, as we found no other suitable solution to share these large files (due to file size or storage limits) for a reasonable price. If you want to obtain the data in another way or know of a more suitable alternative, please contact Lennart Purucker.

The link resolves to a directory containing the following:

example_metatasks: contains an example metatask for test purposes before committing to downloading all files.
metatasks_roc_auc.zip: The Metatasks obtained by running Auto-Sklearn 1 for ROC AUC. metatasks_bacc.zip: The Metatasks obtained by running Auto-Sklearn 1 for Balanced Accuracy.

The size after unzipping the entire file is:

metatasks_roc_auc.zip: ~450GB metatasks_bacc.zip: ~330GB

We suggest extracting only files that are of interest from the .zip archive, as these can be much smaller in size and might suffice for experiments.

The metatask .zip files contain 2 subdirectories for Metatasks produced based on TopN or SiloTopN pruning (see paper for details). In each of these subdirectories, 2 files per metatask exist. One .json file with metadata information and a .hdf or .csv file containing the prediction data. The details on how this should be read and used as a Metatask can be found in the assembled framework and the reproducibility repository. To obtain the data without Metataks, we advise looking at the file content and metadata individually or parsing them by using Metatasks first.
d
Data from: Python-HBRT model and groundwater levels used for estimating the...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Python-HBRT model and groundwater levels used for estimating the static, shallow water table depth for the State of Wisconsin [Dataset]. https://catalog.data.gov/dataset/python-hbrt-model-and-groundwater-levels-used-for-estimating-the-static-shallow-water-tabl
Explore at:
Dataset updated
Nov 12, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Wisconsin
Description
A histrogram-based boosted regression tree (HBRT) method was used to predict the depth to the surficial aquifer water table (in feet) throughout the State of Wisconsin. This method used a combination of discrete groundwater levels from the U.S. Geological Survey National Water Information System, continuous groundwater levels from the National Groundwater Monitoring Network, the State of Wisconsin well-construction database, and NHDPlus version 2.1-derived points. The predicted water table depth utilized the HBRT model available through Scikit-learn in Python version 3.10.10. The HBRT model can predict the surficial water table depth for any latitude and longitude for Wisconsin. A total of 48 predictor variables were used for model development, including basic well characteristics, soil properties, aquifer properties, hydrologic position on the landscape, recharge and evapotranspiration rates, and bedrock characteristics. Model results indicate that the mean surficial water table depth across Wisconsin is 28.3 feet below land surface, with a root mean square error of 7.40 feet for the holdout data to the HBRT model. Aside from the overall HBRT methods contained as part of the Python script, this data release includes a self-contained model directory for recreating the HBRT model published in this data release. The model directory also includes a model object for the HBRT model used to predict the surficial aquifer water table depth (in feet) for the State of Wisconsin. Three separate directories are available within this data release that define the input predictor variables, water levels, and NHD points for the HBRT model. The 'bedrock-overlay' sub-directory contains geospatial data that define the special selection zones used in the depth-to-water well selection (DTW_well_selection_zones.docx). The 'water-levels' sub-directory contains input files for the NHDPlus version 2.1 points, the State of Wisconsin well construction spreadsheets, and water level summary files. The 'python-attributes' sub-directory contains predictor variable rasters and vector data that predict the surficial water table depth for Wisconsin and a Jupyter Notebook used for the attribution and input files for well and NHD points.
f
Table 1_Machine learning prediction of anxiety symptoms in social anxiety...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pack, Seung Pil; Hur, Ji-Won; Jung, Dooyoung; Cho, Chul-Hyun; Park, Jin-Hyun; Lee, Hwamin; Lee, Heon-Jeong; Shin, Yu-Bin (2025). Table 1_Machine learning prediction of anxiety symptoms in social anxiety disorder: utilizing multimodal data from virtual reality sessions.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001283930
Explore at:
Dataset updated
Jan 7, 2025
Authors
Pack, Seung Pil; Hur, Ji-Won; Jung, Dooyoung; Cho, Chul-Hyun; Park, Jin-Hyun; Lee, Hwamin; Lee, Heon-Jeong; Shin, Yu-Bin
Description
IntroductionMachine learning (ML) is an effective tool for predicting mental states and is a key technology in digital psychiatry. This study aimed to develop ML algorithms to predict the upper tertile group of various anxiety symptoms based on multimodal data from virtual reality (VR) therapy sessions for social anxiety disorder (SAD) patients and to evaluate their predictive performance across each data type.MethodsThis study included 32 SAD-diagnosed individuals, and finalized a dataset of 132 samples from 25 participants. It utilized multimodal (physiological and acoustic) data from VR sessions to simulate social anxiety scenarios. This study employed extended Geneva minimalistic acoustic parameter set for acoustic feature extraction and extracted statistical attributes from time series-based physiological responses. We developed ML models that predict the upper tertile group for various anxiety symptoms in SAD using Random Forest, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) models. The best parameters were explored through grid search or random search, and the models were validated using stratified cross-validation and leave-one-out cross-validation.ResultsThe CatBoost, using multimodal features, exhibited high performance, particularly for the Social Phobia Scale with an area under the receiver operating characteristics curve (AUROC) of 0.852. It also showed strong performance in predicting cognitive symptoms, with the highest AUROC of 0.866 for the Post-Event Rumination Scale. For generalized anxiety, the LightGBM’s prediction for the State-Trait Anxiety Inventory-trait led to an AUROC of 0.819. In the same analysis, models using only physiological features had AUROCs of 0.626, 0.744, and 0.671, whereas models using only acoustic features had AUROCs of 0.788, 0.823, and 0.754.ConclusionsThis study showed that a ML algorithm using integrated multimodal data can predict upper tertile anxiety symptoms in patients with SAD with higher performance than acoustic or physiological data obtained during a VR session. The results of this study can be used as evidence for personalized VR sessions and to demonstrate the strength of the clinical use of multimodal data.
c
Salary Prediction Classification Dataset
cubig.ai
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Salary Prediction Classification Dataset [Dataset]. https://cubig.ai/store/products/205/salary-prediction-classification-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Salary data aims to determine whether individuals earn less than or more than $50,000 annually based on their employment, education, and demographic information. It is used widely in analyses that seek to understand income disparities and economic factors influencing earnings.

2) Data Utilization (1) Salary data has characteristics that: • The dataset includes factors such as age, education, job type, hours worked per week, and other socio-economic variables that contribute to predicting salary categories. (2) Salary data can be used to: • Workforce Analysis: Useful for employers and policymakers to understand wage structures and adjust compensation plans accordingly. • Economic Research: Helps researchers analyze economic mobility and the impact of education and employment on income levels.
c
TABLE Price Prediction Data
coinbase.com
Updated Nov 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). TABLE Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-table
Explore at:
Dataset updated
Nov 8, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset TABLE over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
d
Data from: Input data, model output, and R scripts for a machine learning...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Input data, model output, and R scripts for a machine learning streamflow model on the Wyoming Range, Wyoming, 2012–17 [Dataset]. https://catalog.data.gov/dataset/input-data-model-output-and-r-scripts-for-a-machine-learning-streamflow-model-on-the-wyomi
Explore at:
Dataset updated
Nov 20, 2025
Dataset provided by
U.S. Geological Survey
Area covered
Wyoming Range, Wyoming
Description
A machine learning streamflow (MLFLOW) model was developed in R (model is in the Rscripts folder) for modeling monthly streamflow from 2012 to 2017 in three watersheds on the Wyoming Range in the upper Green River basin. Geospatial information for 125 site features (vector data are in the Sites.shp file) and discrete streamflow observation data and environmental predictor data were used in fitting the MLFLOW model and predicting with the fitted model. Tabular calibration and validation data are in the Model_Fitting_Site_Data.csv file, totaling 971 discrete observations and predictions of monthly streamflow. Geospatial information for 17,518 stream grid cells (raster data are in the Streams.tif file) and environmental predictor data were used for continuous streamflow predictions with the MLFLOW model. Tabular prediction data for all the study area (17,518 stream grid cells) and study period (72 months; 2012–17) are in the Model_Prediction_Stream_Data.csv file, totaling 1,261,296 predictions of spatially and temporally continuous monthly streamflow. Additional information about the datasets is in the metadata included in the four zipped dataset files and about the MLFLOW model is in the readme included in the zipped model archive folder.
Additional file 1 of Harnessing repeated measurements of predictor variables...
figshare.com
springernature.figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucy M. Bull; Mark Lunt; Glen P. Martin; Kimme Hyrich; Jamie C. Sergeant (2023). Additional file 1 of Harnessing repeated measurements of predictor variables for clinical risk prediction: a review of existing methods [Dataset]. http://doi.org/10.6084/m9.figshare.12629583.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12629583.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Lucy M. Bull; Mark Lunt; Glen P. Martin; Kimme Hyrich; Jamie C. Sergeant
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Table 1 Reference List. Title of data: Journal articles included in the review, ordered alphabetically by first author surname. Description of data: A list of all articles included in the review, including author names, title of publication, journal of publication, volume, pages and DOI number.
c
Round Table Price Prediction Data
coinbase.com
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Round Table Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-round-table-deb6
Explore at:
Dataset updated
Nov 26, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset Round Table over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
f
Table 1_Construction of a prediction model for sarcopenic obesity based on...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jun 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cui, Ziwei; Mao, Yongjun; Duan, Yuting; Zhou, Jing; Liu, Jia; Luan, Tongxiao; Xu, Mengru; Hu, Song; Wang, Aohua (2025). Table 1_Construction of a prediction model for sarcopenic obesity based on machine learning.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002040997
Explore at:
Dataset updated
Jun 27, 2025
Authors
Cui, Ziwei; Mao, Yongjun; Duan, Yuting; Zhou, Jing; Liu, Jia; Luan, Tongxiao; Xu, Mengru; Hu, Song; Wang, Aohua
Description
BackgroundIn the context of the rapidly aging global population, sarcopenic obesity (SO) in older adults is associated with significantly higher rates of disability and mortality. SO has become a serious and critical public health concern. This study aimed to develop and validate predictive models using machine learning (ML) to identify SO in patients.MethodsData from 386 participants collected at the Affiliated Hospital of Qingdao University were divided into an 8:2 ratio, with 80% used for training and 20% for testing. Univariate analysis was performed to identify the factors correlated with SO, and multivariate logistic regression analysis was performed to determine the independent factors influencing SO. The Shapley Additive exPlanations (SHAP) diagram was used to illustrate the importance of variables in the model. To develop a predictive model for SO, we used five models and applied internal five-fold cross-validation to determine the most suitable hyperparameters for the model.ResultsAmong 386 participants, 61 were diagnosed with sarcopenic obesity (15.8%). We identified four independent predictive factors, namely BMI, Barthel Index score, grip strength, and calf circumference. Notably, calf circumference plays an important role in assessing the risk of SO in older adults. The area under the curve (AUC) values of the test set for the Random forest (RF), naive Bayes (NB), Light Gradient Boosting Machine (LightGBM), k-nearest neighbor algorithm (KNN), and eXtreme Gradient Boosting (XGBoost) models were recorded as 0.839, 0.815, 0.808, 0.794, and 0.798, respectively. Among these models, the RF model exhibited the best average performance in the training set, with an AUC value of 0.839.ConclusionWe constructed a predictive model based on the results of the RF model, combining four clinical predictors—BMI, Barthel Index score, grip strength, and calf circumference—to reliably predict SO.

Facebook

Twitter

Click to copy link

Link copied

Cite

CUBIG (2025). Cirrhosis Prediction Dataset [Dataset]. https://cubig.ai/store/products/211/cirrhosis-prediction-dataset

Cirrhosis Prediction Dataset

Explore at:

zipAvailable download formats

Dataset updated

May 2, 2025

Dataset authored and provided by

CUBIG

License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique

Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy

Description

1) Data Introduction • The Cirrhosis Prediction dataset is intended for the advancement of machine learning models to predict the stage of liver cirrhosis. It contains various clinical features, which are vital for prognosis and treatment strategies.

2) Data Utilization (1) Cirrhosis Prediction data has characteristics that: • It includes clinical data like liver biochemistry, demographic details, and histology grading. • The dataset aids in developing predictive models for staging liver cirrhosis, potentially improving patient outcomes. (2) Cirrhosis Prediction data can be used to: • Medical Research: It is used in developing algorithms for early detection and progression tracking of liver cirrhosis. • Healthcare Strategy: Assists in forming medical interventions and managing treatment plans for patients.

Clear search

Close search

Google apps

Main menu

Cirrhosis Prediction Dataset

Data from: Diabetes Risk Prediction

Lean Body Mass Prediction Challenge

Patient Survival Prediction Dataset

Orbit Classification For Prediction / NASA Dataset

BPM Prediction Challenge

Data_Sheet_1_Development of a deep learning model for predicting recurrence...

Campus Placement Prediction: Binary Classification Dataset

Fuel Efficiency Prediction Challenge

Churn Prediction for Credit Card Customer

Chesapeake Bay Nitrogen Trend Predictor Dataset

Metatasks for Auto-Sklearn 1 - ROC AUC and Balanced Accuracy

Data from: Python-HBRT model and groundwater levels used for estimating the...

Table 1_Machine learning prediction of anxiety symptoms in social anxiety...

Salary Prediction Classification Dataset

TABLE Price Prediction Data

Data from: Input data, model output, and R scripts for a machine learning...

Additional file 1 of Harnessing repeated measurements of predictor variables...

Round Table Price Prediction Data

Table 1_Construction of a prediction model for sarcopenic obesity based on...

Cirrhosis Prediction Dataset