100+ datasets found

d
Prediction data from: Machine learning predicts which rivers, streams, and...
datadryad.org
dataone.org
+1more
zip
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro (2023). Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates [Dataset]. http://doi.org/10.5061/dryad.z34tmpgm7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.z34tmpgm7
Dataset updated
Dec 10, 2023
Dataset provided by
Dryad
Authors
Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro
Time period covered
Sep 27, 2023
Description
This dataset contains model outputs that were analyzed to produce the main results of the paper.
d
Data from: Machine-learning model predictions and groundwater-quality...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Machine-learning model predictions and groundwater-quality rasters of specific conductance, total dissolved solids, and chloride in aquifers of the Mississippi Embayment [Dataset]. https://catalog.data.gov/dataset/machine-learning-model-predictions-and-groundwater-quality-rasters-of-specific-conductance
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).
n
Electroencephalogram Database: Prediction of Epileptic Seizures
neuinfo.org
dknet.org
+2more
Updated May 10, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2005). Electroencephalogram Database: Prediction of Epileptic Seizures [Dataset]. http://identifiers.org/RRID:SCR_008032
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008032
Dataset updated
May 10, 2005
Description
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on April 29,2025. Electroencephalogram (EEG) data recorded from invasive and scalp electrodes. The EEG database contains invasive EEG recordings of 21 patients suffering from medically intractable focal epilepsy. The data were recorded during an invasive pre-surgical epilepsy monitoring at the Epilepsy Center of the University Hospital of Freiburg, Germany. In eleven patients, the epileptic focus was located in neocortical brain structures, in eight patients in the hippocampus, and in two patients in both. In order to obtain a high signal-to-noise ratio, fewer artifacts, and to record directly from focal areas, intracranial grid-, strip-, and depth-electrodes were utilized. The EEG data were acquired using a Neurofile NT digital video EEG system with 128 channels, 256 Hz sampling rate, and a 16 bit analogue-to-digital converter. Notch or band pass filters have not been applied. For each of the patients, there are datasets called ictal and interictal, the former containing files with epileptic seizures and at least 50 min pre-ictal data. the latter containing approximately 24 hours of EEG-recordings without seizure activity. At least 24 h of continuous interictal recordings are available for 13 patients. For the remaining patients interictal invasive EEG data consisting of less than 24 h were joined together, to end up with at least 24 h per patient. An interdisciplinary project between: * Epilepsy Center, University Hospital Freiburg * Bernstein Center for Computational Neuroscience (BCCN), Freiburg * Freiburg Center for Data Analysis and Modeling (FDM).
h
stems-predict-data
huggingface.co
Updated Aug 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Forlines (2024). stems-predict-data [Dataset]. https://huggingface.co/datasets/jfo150/stems-predict-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 30, 2024
Authors
Jack Forlines
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
jfo150/stems-predict-data dataset hosted on Hugging Face and contributed by the HF Datasets community
U
Process-based water temperature predictions in the Midwest US: 5 Model...
data.usgs.gov
gimi9.com
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Read; Jacob Zwart; Holly Kundel; Hayley Corson-Dosch; Gretchen Hansen; Kelsey Vitense; Alison Appling; Samantha Oliver; Lindsay Platt, Process-based water temperature predictions in the Midwest US: 5 Model prediction data [Dataset]. http://doi.org/10.5066/P9CA6XP8
Explore at:
Unique identifier
https://doi.org/10.5066/P9CA6XP8
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Jordan Read; Jacob Zwart; Holly Kundel; Hayley Corson-Dosch; Gretchen Hansen; Kelsey Vitense; Alison Appling; Samantha Oliver; Lindsay Platt
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 1980 - Dec 31, 2019
Area covered
Midwestern United States, United States
Description
Multiple modeling frameworks were used to predict daily temperatures at 0.5m depth intervals for a set of diverse lakes in the U.S. states of Minnesota and Wisconsin. General Lake Model verion 2 process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error for 449 lakes (PBALL). Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations for 7,150 lakes.
c
predict Price Prediction Data
coinbase.com
Updated Nov 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). predict Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/predict
Explore at:
Dataset updated
Nov 5, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset predict over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
d
PREDiCT
dknet.org
scicrunch.org
+2more
Updated Sep 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PREDiCT [Dataset]. http://identifiers.org/RRID:SCR_015517
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_015517
Dataset updated
Sep 16, 2025
Description
Patient database that contains EEG data sets, executable tasks, and computational tools., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
c
Predict Crypto Price Prediction Data
coinbase.com
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Predict Crypto Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/predict-crypto
Explore at:
Dataset updated
Nov 26, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset Predict Crypto over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
data-temperature
kaggle.com
zip
Updated Aug 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luigi Perotti Souza (2020). data-temperature [Dataset]. https://www.kaggle.com/luigiperottisouza/datatemperature
Explore at:
zip(26987 bytes)Available download formats
Dataset updated
Aug 18, 2020
Authors
Luigi Perotti Souza
Description
Dataset

This dataset was created by Luigi Perotti Souza

Contents
d
Process-guided deep learning water temperature predictions: 5a Lake Mendota...
catalog.data.gov
data.usgs.gov
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Process-guided deep learning water temperature predictions: 5a Lake Mendota detailed prediction data [Dataset]. https://catalog.data.gov/dataset/process-guided-deep-learning-water-temperature-predictions-5a-lake-mendota-detailed-predic
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Lake Mendota
Description
Multiple modeling frameworks were used to predict daily temperatures at 0.5m depth intervals for a set of diverse lakes in the U.S. states of Minnesota and Wisconsin. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations.

Data-Driven Drought Prediction Project Model Outputs: Daily Streamflow and...

catalog.data.gov
data.usgs.gov
+1more

Updated Nov 19, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2025). Data-Driven Drought Prediction Project Model Outputs: Daily Streamflow and Streamflow Percentile Predictions for the Colorado River Basin Region [Dataset]. https://catalog.data.gov/dataset/data-driven-drought-prediction-project-model-outputs-daily-streamflow-and-streamflow-perce

Explore at:

Dataset updated

Nov 19, 2025

Dataset provided by

U.S. Geological Survey

Area covered

Colorado River

Description

This metadata record describes outputs from 12 configurations of long short-term memory (LSTM) models which were used to predict streamflow drought occurrence at 384 stream gage locations in the Colorado River Basin region. The models were trained on data from 01-Oct-1981 to 31-Mar-2005 and validated over the period of record spanning 01-Apr-2005 to 31-Mar- 2014. The models use explanatory variable inputs described in Wieczorek (2023) (doi.org/10.5066/P98IG8LO) to predict daily streamflow and streamflow percentiles as described in Simeone (2022) (doi.org/10.5066/P92FAASD). Separate models were trained to predict daily streamflow and streamflow percentiles. Two types of percentiles were modeled: (1) fixed-threshold percentiles that are based on comparing all streamflow throughout the year, and (2) variable-threshold percentiles that compare streamflow separately for each day of the year (using a moving 30-day window). Separate models were trained for predicting at lead times of 0, 7 and 14 days ahead. Details on methods and model configurations can be found in Hamshaw and others (2023). The comma separated files are grouped by target variables and lead times as listed in the table below and include model output for the validation period (01-Apr-2005 to 31-Mar-2014). This metadata record also includes model code (see Readme.txt within the CRB_NN_model_archive.zip for more details) and a model performance metrics file (model_validation_performance_metrics_by_gage.csv).

Model configurations included in the data release. PUB refers to "Predictions in Ungaged Basins" model configuration and Q refers to streamflow.
Data File	Prediction target variable	Forecast lead time	Model Configurations
streamflow_model_predictions_0day_ahead.csv	Daily Streamflow (mm/day)	0 days	Streamflow-0d, PUB-Streamflow-0d
streamflow_model_predictions_7day_ahead.csv	Daily Streamflow (mm/day)	7 days	Streamflow-7d
streamflow_model_predictions_14day_ahead.csv	Daily Streamflow (mm/day)	14 days	Streamflow-14d
percentile_fixed_model_predictions_0day_ahead.csv	Fixed Percentile	0 days	Fixed-0d, PUB-Fixed-0d Q-to-Fixed-0d
percentile_fixed_model_predictions_7day_ahead.csv	Fixed Percentile	7 days	Fixed-7d
percentile_fixed_model_predictions_14day_ahead.csv	Fixed Percentile	14 days	Fixed-14d
percentile_variable_model_predictions_0day_ahead.csv	Variable Percentile	0 days	Variable-0d, PUB-Variable-0d, Q-to-Variable-0d
percentile_variable_model_predictions_7day_ahead.csv	Variable Percentile	7 days	Variable-7d
percentile_variable_model_predictions_14day_ahead.csv	Variable Percentile	14 days	Variable-14d

D
Data from: Artificial Intelligence Prediction Across 12,000 Samples Shows...
researchdata.ntu.edu.sg
csv, tsv, txt, zip
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fuying Dao; Fuying Dao (2025). Artificial Intelligence Prediction Across 12,000 Samples Shows Widespread Increased Gene-Gene Chromatin Interactions in Cancers that Constitute Therapeutic Vulnerabilities [Dataset]. http://doi.org/10.21979/N9/ORBU74
Explore at:
tsv(2652403), tsv(7159400), tsv(12347330), tsv(246040466), tsv(40605224), tsv(32957874), tsv(5307339), tsv(488780107), csv(197030), tsv(842382), tsv(22841142), tsv(178479924), tsv(170156168), tsv(625788), tsv(79562071), tsv(66545785), tsv(183238105), tsv(171640328), tsv(145601635), tsv(4629697), tsv(14265364), tsv(9709025), tsv(451576508), tsv(43074691), tsv(600987), tsv(22839751), tsv(7548831), tsv(472039434), tsv(2262393), tsv(5047986), tsv(4653365), tsv(84766524), tsv(13589407), tsv(13261124), tsv(14556795), tsv(4479277), tsv(39214013), tsv(152107274), tsv(2067322), tsv(12025967), tsv(14188350), tsv(37870374), tsv(1132089), tsv(3797310), tsv(37722894), tsv(4187122), tsv(4147614), tsv(350214132), tsv(4123077), tsv(4677431), tsv(116759755), tsv(471108582), tsv(2313407), tsv(274860333), tsv(5686381), txt(2173), tsv(12713251), tsv(11836560), tsv(12644048), tsv(388343301), tsv(4123872), tsv(1421075), csv(45265), tsv(4503726), tsv(6618252), tsv(13322081), csv(2148458548), zip(309076), tsv(43018125), tsv(100952054), tsv(176293158), tsv(13273525), tsv(144665998), tsv(170710561), tsv(9181020), tsv(56296731), tsv(4842328), tsv(388328744), tsv(56518303), tsv(20445114), tsv(37515424), tsv(3821902), tsv(42227286), tsv(3992282), tsv(1750893), tsv(5537984), tsv(147817825), tsv(7208091), tsv(6542114), tsv(11720494), tsv(3954880), tsv(1107678), tsv(4653235), tsv(274872850), csv(1037865091), tsv(83569260), tsv(39137013), tsv(10179510), tsv(58171411), csv(11544), tsv(32262262), tsv(4195808), csv(1943588), tsv(12102773), tsv(32262164), tsv(2915660), tsv(1531791), tsv(4173730), tsv(83492243), tsv(1662525), tsv(2091924), tsv(29096502), tsv(1312274), tsv(1566923), tsv(6452476), tsv(59304444), tsv(44312570), csv(507732533), tsv(159545757), tsv(123046981), tsv(866789), tsv(474839195), tsv(485079214), tsv(3066090), tsv(39446083), tsv(12371360), tsv(86253496), tsv(10179465), tsv(8395161), tsv(40294979), tsv(78634325), tsv(49798757), tsv(76517980), csv(517955102), tsv(86074132), tsv(4625671), tsv(12810287), tsv(27207350), tsv(10398653), tsv(4676184), csv(505865193), tsv(42071272), tsv(396706687), tsv(14652579), tsv(62341272), tsv(12445193), tsv(3882421)Available download formats
Unique identifier
https://doi.org/10.21979/N9/ORBU74
Dataset updated
Jan 20, 2025
Dataset provided by
DR-NTU (Data)
Authors
Fuying Dao; Fuying Dao
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset funded by
Ministry of Education (MOE)
National Research Foundation (NRF)
Description
Gene-gene chromatin interactions (GGIs) bring distal genes into close spatial proximity to permit strong co-expression, which could potentially contribute to cancer progression. High-throughput methods like Hi-C are impractical for very large cohort analyses, thus we developed AI4Loop, an Artificial Intelligence (AI) Deep Learning -based tool to predict GGIs using RNA-Seq data. Applying AI4Loop to 12,000 patient samples from the TCGA database across 32 cancer types revealed that GGIs show increased cancer sub-type predictivity compared to RNA-Seq data and demonstrated oncogenic gains of GGIs interaction in almost all cancers examined. To target the therapeutic vulnerability of gain of GGIs in cancers, using low-information RNA expression datasets from the CLUE database, we also constructed a drug-perturbation GGI atlas from 50,000 drug-treated samples to identify and repurposed compounds that disrupt oncogenic GGIs. Notably, we found that the antibiotics eperezolid and radezolid reduced cancer-acquired GGIs, which we confirmed with Hi-C experiment. This work showcases AI-directed research in epigenetics, enhances cancer biology predictivity and can promote wide-range drug repurposing in the future.
4
Data and code underlying the paper: "Can we predict the Most Replayed data...
data.4tu.nl
zip
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Duico; Ombretta Strafforello; Jan van Gemert (2023). Data and code underlying the paper: "Can we predict the Most Replayed data of video streaming platforms?" [Dataset]. http://doi.org/10.4121/0ca18691-3fef-4c9c-9080-12b20daae62a.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/0ca18691-3fef-4c9c-9080-12b20daae62a.v1
Dataset updated
Sep 14, 2023
Dataset provided by
4TU.ResearchData
Authors
Alessandro Duico; Ombretta Strafforello; Jan van Gemert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Predicting which specific parts of a video users will replay is important for several applications, including targeted advertisement placement on video platforms and assisting video creators. In this work, we explore whether it is possible to predict the Most Replayed (MR) data from YouTube videos. To this end, we curate a large video benchmark, the YTMR500 dataset, which comprises 500 YouTube videos with MR data annotations. We evaluate Deep Learning (DL) models of varying complexity on our dataset and perform an extensive ablation study. In addition, we conduct a user study to estimate the human performance on MR data prediction. Our results show that, although by a narrow margin, all the evaluated DL models outperform random predictions. Additionally, they exceed human-level accuracy. This suggests that predicting the MR data is a difficult task that can be enhanced through the assistance of DL. In this repository, we provide our code and dataset. The code includes our trained and tested models, our user studies and results analysis. The YTMR500 dataset is provided through an H5 file.
Data from: Using Geospatial Data and Random Forest To Predict PFAS...
catalog.data.gov
s.cnmilf.com
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Using Geospatial Data and Random Forest To Predict PFAS Contamination in Fish Tissue in the Columbia River Basin, United States [Dataset]. https://catalog.data.gov/dataset/using-geospatial-data-and-random-forest-to-predict-pfas-contamination-in-fish-tissue-in-th
Explore at:
Dataset updated
Nov 11, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Columbia River, Columbia River drainage basin, United States
Description
Publicly available data about potential PFAS sources and PFAS measurements in fish tissue. This dataset is associated with the following publication: DeLuca, N., A. Mullikin, P. Brumm, A. Rappold, and E. Hubal. Using Geospatial Data and Random Forest To Predict PFAS Contamination in Fish Tissue in the Columbia River Basin, United States. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 57: 14024-14035, (2023).
Gold Data to Predict the Stock Market
kaggle.com
zip
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angel Varela (2025). Gold Data to Predict the Stock Market [Dataset]. https://www.kaggle.com/datasets/angelvarela/gold-data-to-predict-the-stock-market
Explore at:
zip(49607779 bytes)Available download formats
Dataset updated
Apr 17, 2025
Authors
Angel Varela
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset can be used to predict the stock market. The data is extracted from MT5 terminal integrated in python.

The datasets include the minute by minute fluctuations of Gold and Silver prices over from 1st of January 2023 to 17th April 2025. The data can be used to train models for seasonality or a minute-by-minute approach.

The data has 7 columns:

Time: The minute in which the event or movement occurred (Can be converted to Datetype with pd.to_datatime)

Open Price

High Price

Low Price

Close Price (The Feature to Predict if desired)

Tick Volume: The Volume at the in which the event or movement occurred

EMA: Exponential Moving Average (Technical Estimator for Risk management in the predictions and overall better performance)

OBV: On-Balance Volume (Technical Estimator for Risk management in the predictions and overall better performance)

Two datasets are used;

Achilles Data Gold-Silver: with 1,416,340 rows to predict Gold, Silver and other Metals. Achilles Data Gold: with 708,264 rows to predict Gold, Silver and other Metals.

You may find the paper of our implementation here: https://doi.org/10.48550/arXiv.2410.21291
c
Vibe Predict Price Prediction Data
coinbase.com
Updated Oct 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Vibe Predict Price Prediction Data [Dataset]. https://www.coinbase.com/en-de/price-prediction/base-vibe-predict-ff3f
Explore at:
Dataset updated
Oct 30, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset Vibe Predict over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
U
Process-guided deep learning water temperature predictions: 5c All lakes...
data.usgs.gov
datasets.ai
+3more
Updated Feb 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Read; Xiaowei Jia; Jared Willard; Alison Appling; Jacob Zwart; Samantha Oliver; Anuj Karpatne; Gretchen Hansen; Paul Hanson; William Watkins; Michael Steinbach; Kumar Vipin (2024). Process-guided deep learning water temperature predictions: 5c All lakes historical prediction data [Dataset]. http://doi.org/10.5066/P9AQPIVD
Explore at:
Unique identifier
https://doi.org/10.5066/P9AQPIVD
Dataset updated
Feb 24, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Jordan Read; Xiaowei Jia; Jared Willard; Alison Appling; Jacob Zwart; Samantha Oliver; Anuj Karpatne; Gretchen Hansen; Paul Hanson; William Watkins; Michael Steinbach; Kumar Vipin
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Apr 1, 1980 - Dec 31, 2018
Description
Multiple modeling frameworks were used to predict daily temperatures at 0.5m depth intervals for a set of diverse lakes in the U.S. states of Minnesota and Wisconsin. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Zip files for each lake contain four files, one for each of PB, PB0, DL, and PG ...
n
Data from: Simple attributes predict the value of plants as hosts to fungal...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+2more
zip
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hans Henrik Bruun; Ane Kirstine Brunbjerg; Camilla Fløjgaard; Lars Dalby; Tobias Guldberg Frøslev; Simon Haarder; Jacob Heilmann-Clausen; Toke Hoye; Thomas Læssøe; Rasmus Ejrnæs (2022). Simple attributes predict the value of plants as hosts to fungal and arthropod communities [Dataset]. http://doi.org/10.5061/dryad.hx3ffbgg8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.hx3ffbgg8
Dataset updated
Feb 8, 2022
Dataset provided by
Independent researcher
University of Copenhagen
Aarhus University
Authors
Hans Henrik Bruun; Ane Kirstine Brunbjerg; Camilla Fløjgaard; Lars Dalby; Tobias Guldberg Frøslev; Simon Haarder; Jacob Heilmann-Clausen; Toke Hoye; Thomas Læssøe; Rasmus Ejrnæs
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Fungal and arthropod consumers constitute the vast majority of global terrestrial biodiversity. Yet, the link from richness and composition of producer (plant) communities to the richness of consumer communities is poorly understood. Fungal and arthropod species richness could be a simple function of producer species richness at a site. Alternatively, it could be a complex function of chemical and structural properties of the producer species making up communities. We used databases on plant-fungus and plant-arthropod trophic links to derive the richness of consumer biota per associated plant species (coined link score). We assessed how well link scores could be predicted by simple attributes of plant species. Next, we used a multi-taxon inventory of 130 sites, representing all major habitat types in a country (Denmark), to investigate whether link scores summed over plant species in communities (coined link sum) could outperform simple plant species richness as predictor of fungal and arthropod richness at the sites. We found plant species’ link scores for both fungi and arthropods to be positively related to plant size, regional occupancy, nativeness and ectomycorrhizal status. Link-based indices generally improved the prediction of richness of fungal and arthropod communities. For fungal communities, both observed link sum (from databases) and predicted link sum (from plant attributes) had high predictive power, while plant richness alone had none. For arthropod communities, predictive performance varied between functional groups. For both fungi and arthropods, richness predictions were further improved by considering abiotic habitat conditions. Our results underline the importance of plants as niche space for the megadiverse groups of arthropods and fungi. The plant-attribute approach holds promise for predicting local and regional consumer richness in areas of the world lacking detailed plant-consumer databases. Methods Data on plant-fungus and plant-arthropod interaction links for the 549 plant species found across the 130 BioWide sites in Denmark. Detailed descriptions of field data collection protocols are found in Brunbjerg AK, Bruun HH, Brøndum L, Classen AT, Dalby L, Fog K, et al. (2019) A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage. BMC Ecology 19(1):43. doi: 10.1186/s12898-019-0260-x. Raw data on known interaction links between all relevant plant taxa (1349 taxa on the species or genus levels) and associated arthropod species were retrieved from the BRC database (https://www.brc.ac.uk/dbif/) and similar data regarding associated fungal species from the Danish Fungal Database (https://svampe.databasen.org/). The raw data were processed to obtain an observed arthropod link score and an observed fungal link score per plant species. The calculus of link scores from raw data is detailed in the associated manuscript. Attributes of the 549 plant species used to model their predicted link score (ectomycorrhizal status, native area, occupancy in Denmark, phylogenetic grouping, lifespan, life form and size) were compiled from sources detailed in the associated manuscript.
Data from: Global and Episode-Specific Prediction of Recurrent Events Using...
tandf.figshare.com
pdf
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yifei Sun; Sy Han Chiou; Chiung-Yu Huang (2025). Global and Episode-Specific Prediction of Recurrent Events Using Longitudinal Health Informatics Data [Dataset]. http://doi.org/10.6084/m9.figshare.28946244.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28946244.v1
Dataset updated
Jul 3, 2025
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Yifei Sun; Sy Han Chiou; Chiung-Yu Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accurate prediction of recurrent clinical events is crucial for effective management of chronic conditions such as cancer and cardiovascular disease. In recent years, longitudinal health informatics databases, which routinely collect data on repeated clinical events, have been increasingly used to construct risk prediction models. We introduce a novel nonparametric framework to predict recurrent events on a gap time scale using survival tree ensembles. Our framework incorporates two predictive modeling strategies: episode-specific model and global model. These models avoid strong assumptions on how future event risk depends on previous event history and other predictors, making them a promising alternative to Cox-type models. Additional complexities in tree-based prediction for recurrent events include induced informative censoring of gap times and inter-event correlations. We develop algorithms to address these issues through the use of inverse probability of censoring weighting and modified resampling procedures. Applied to SEER-Medicare data to predict repeated hospitalizations for breast cancer patients, our models showed superior performance. In particular, borrowing information across events via global models substantially improved prediction accuracy for later hospitalizations. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Predict students' dropout and academic success
kaggle.com
zip
Updated Jan 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
Explore at:
zip(89332 bytes)Available download formats
Dataset updated
Jan 3, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Predict students' dropout and academic success

Investigating the Impact of Social and Economic Factors

By [source]

About this dataset

This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

Research Ideas

Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.

Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.

Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...

Facebook

Twitter

Click to copy link

Link copied

Cite

Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro (2023). Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates [Dataset]. http://doi.org/10.5061/dryad.z34tmpgm7

Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.z34tmpgm7

Dataset updated

Dec 10, 2023

Dataset provided by

Dryad

Authors

Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro

Time period covered

Sep 27, 2023

Description

This dataset contains model outputs that were analyzed to produce the main results of the paper.

Clear search

Close search

Google apps

Main menu

Prediction data from: Machine learning predicts which rivers, streams, and...

Data from: Machine-learning model predictions and groundwater-quality...

Electroencephalogram Database: Prediction of Epileptic Seizures

stems-predict-data

Process-based water temperature predictions in the Midwest US: 5 Model...

predict Price Prediction Data

PREDiCT

Predict Crypto Price Prediction Data

data-temperature

Dataset

Contents

Process-guided deep learning water temperature predictions: 5a Lake Mendota...

Data-Driven Drought Prediction Project Model Outputs: Daily Streamflow and...

Data from: Artificial Intelligence Prediction Across 12,000 Samples Shows...

Data and code underlying the paper: "Can we predict the Most Replayed data...

Data from: Using Geospatial Data and Random Forest To Predict PFAS...

Gold Data to Predict the Stock Market

Vibe Predict Price Prediction Data

Process-guided deep learning water temperature predictions: 5c All lakes...

Data from: Simple attributes predict the value of plants as hosts to fungal...

Data from: Global and Episode-Specific Prediction of Recurrent Events Using...

Predict students' dropout and academic success

Predict students' dropout and academic success

Investigating the Impact of Social and Economic Factors

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates