100+ datasets found
  1. d

    Prediction data from: Machine learning predicts which rivers, streams, and...

    • datadryad.org
    • dataone.org
    • +1more
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro (2023). Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates [Dataset]. http://doi.org/10.5061/dryad.z34tmpgm7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Dryad
    Authors
    Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro
    Time period covered
    Sep 27, 2023
    Description

    This dataset contains model outputs that were analyzed to produce the main results of the paper.

  2. d

    Data from: Machine-learning model predictions and groundwater-quality...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Machine-learning model predictions and groundwater-quality rasters of specific conductance, total dissolved solids, and chloride in aquifers of the Mississippi Embayment [Dataset]. https://catalog.data.gov/dataset/machine-learning-model-predictions-and-groundwater-quality-rasters-of-specific-conductance
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).

  3. n

    Electroencephalogram Database: Prediction of Epileptic Seizures

    • neuinfo.org
    • dknet.org
    • +2more
    Updated May 10, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2005). Electroencephalogram Database: Prediction of Epileptic Seizures [Dataset]. http://identifiers.org/RRID:SCR_008032
    Explore at:
    Dataset updated
    May 10, 2005
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on April 29,2025. Electroencephalogram (EEG) data recorded from invasive and scalp electrodes. The EEG database contains invasive EEG recordings of 21 patients suffering from medically intractable focal epilepsy. The data were recorded during an invasive pre-surgical epilepsy monitoring at the Epilepsy Center of the University Hospital of Freiburg, Germany. In eleven patients, the epileptic focus was located in neocortical brain structures, in eight patients in the hippocampus, and in two patients in both. In order to obtain a high signal-to-noise ratio, fewer artifacts, and to record directly from focal areas, intracranial grid-, strip-, and depth-electrodes were utilized. The EEG data were acquired using a Neurofile NT digital video EEG system with 128 channels, 256 Hz sampling rate, and a 16 bit analogue-to-digital converter. Notch or band pass filters have not been applied. For each of the patients, there are datasets called ictal and interictal, the former containing files with epileptic seizures and at least 50 min pre-ictal data. the latter containing approximately 24 hours of EEG-recordings without seizure activity. At least 24 h of continuous interictal recordings are available for 13 patients. For the remaining patients interictal invasive EEG data consisting of less than 24 h were joined together, to end up with at least 24 h per patient. An interdisciplinary project between: * Epilepsy Center, University Hospital Freiburg * Bernstein Center for Computational Neuroscience (BCCN), Freiburg * Freiburg Center for Data Analysis and Modeling (FDM).

  4. h

    stems-predict-data

    • huggingface.co
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Forlines (2024). stems-predict-data [Dataset]. https://huggingface.co/datasets/jfo150/stems-predict-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2024
    Authors
    Jack Forlines
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    jfo150/stems-predict-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. U

    Process-based water temperature predictions in the Midwest US: 5 Model...

    • data.usgs.gov
    • gimi9.com
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Read; Jacob Zwart; Holly Kundel; Hayley Corson-Dosch; Gretchen Hansen; Kelsey Vitense; Alison Appling; Samantha Oliver; Lindsay Platt, Process-based water temperature predictions in the Midwest US: 5 Model prediction data [Dataset]. http://doi.org/10.5066/P9CA6XP8
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Jordan Read; Jacob Zwart; Holly Kundel; Hayley Corson-Dosch; Gretchen Hansen; Kelsey Vitense; Alison Appling; Samantha Oliver; Lindsay Platt
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 1980 - Dec 31, 2019
    Area covered
    Midwestern United States, United States
    Description

    Multiple modeling frameworks were used to predict daily temperatures at 0.5m depth intervals for a set of diverse lakes in the U.S. states of Minnesota and Wisconsin. General Lake Model verion 2 process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error for 449 lakes (PBALL). Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations for 7,150 lakes.

  6. c

    predict Price Prediction Data

    • coinbase.com
    Updated Nov 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). predict Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/predict
    Explore at:
    Dataset updated
    Nov 5, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset predict over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  7. d

    PREDiCT

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PREDiCT [Dataset]. http://identifiers.org/RRID:SCR_015517
    Explore at:
    Dataset updated
    Sep 16, 2025
    Description

    Patient database that contains EEG data sets, executable tasks, and computational tools., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.

  8. c

    Predict Crypto Price Prediction Data

    • coinbase.com
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Predict Crypto Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/predict-crypto
    Explore at:
    Dataset updated
    Nov 26, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset Predict Crypto over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  9. data-temperature

    • kaggle.com
    zip
    Updated Aug 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luigi Perotti Souza (2020). data-temperature [Dataset]. https://www.kaggle.com/luigiperottisouza/datatemperature
    Explore at:
    zip(26987 bytes)Available download formats
    Dataset updated
    Aug 18, 2020
    Authors
    Luigi Perotti Souza
    Description

    Dataset

    This dataset was created by Luigi Perotti Souza

    Contents

  10. d

    Process-guided deep learning water temperature predictions: 5a Lake Mendota...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Process-guided deep learning water temperature predictions: 5a Lake Mendota detailed prediction data [Dataset]. https://catalog.data.gov/dataset/process-guided-deep-learning-water-temperature-predictions-5a-lake-mendota-detailed-predic
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Lake Mendota
    Description

    Multiple modeling frameworks were used to predict daily temperatures at 0.5m depth intervals for a set of diverse lakes in the U.S. states of Minnesota and Wisconsin. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations.

  11. d

    Data-Driven Drought Prediction Project Model Outputs: Daily Streamflow and...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data-Driven Drought Prediction Project Model Outputs: Daily Streamflow and Streamflow Percentile Predictions for the Colorado River Basin Region [Dataset]. https://catalog.data.gov/dataset/data-driven-drought-prediction-project-model-outputs-daily-streamflow-and-streamflow-perce
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Colorado River
    Description

    This metadata record describes outputs from 12 configurations of long short-term memory (LSTM) models which were used to predict streamflow drought occurrence at 384 stream gage locations in the Colorado River Basin region. The models were trained on data from 01-Oct-1981 to 31-Mar-2005 and validated over the period of record spanning 01-Apr-2005 to 31-Mar- 2014. The models use explanatory variable inputs described in Wieczorek (2023) (doi.org/10.5066/P98IG8LO) to predict daily streamflow and streamflow percentiles as described in Simeone (2022) (doi.org/10.5066/P92FAASD). Separate models were trained to predict daily streamflow and streamflow percentiles. Two types of percentiles were modeled: (1) fixed-threshold percentiles that are based on comparing all streamflow throughout the year, and (2) variable-threshold percentiles that compare streamflow separately for each day of the year (using a moving 30-day window). Separate models were trained for predicting at lead times of 0, 7 and 14 days ahead. Details on methods and model configurations can be found in Hamshaw and others (2023). The comma separated files are grouped by target variables and lead times as listed in the table below and include model output for the validation period (01-Apr-2005 to 31-Mar-2014). This metadata record also includes model code (see Readme.txt within the CRB_NN_model_archive.zip for more details) and a model performance metrics file (model_validation_performance_metrics_by_gage.csv).

    Model configurations included in the data release. PUB refers to "Predictions in Ungaged Basins" model configuration and Q refers to streamflow.
    Data FilePrediction target variableForecast lead timeModel Configurations
    streamflow_model_predictions_0day_ahead.csvDaily Streamflow (mm/day)0 days

    Streamflow-0d, 

    PUB-Streamflow-0d

    streamflow_model_predictions_7day_ahead.csvDaily Streamflow (mm/day)7 days

    Streamflow-7d

    streamflow_model_predictions_14day_ahead.csvDaily Streamflow (mm/day)14 daysStreamflow-14d
    percentile_fixed_model_predictions_0day_ahead.csvFixed Percentile0 days

    Fixed-0d,

    PUB-Fixed-0d

    Q-to-Fixed-0d

    percentile_fixed_model_predictions_7day_ahead.csvFixed Percentile7 daysFixed-7d
    percentile_fixed_model_predictions_14day_ahead.csvFixed Percentile14 daysFixed-14d
    percentile_variable_model_predictions_0day_ahead.csvVariable Percentile0 days

    Variable-0d,

    PUB-Variable-0d,

    Q-to-Variable-0d

    percentile_variable_model_predictions_7day_ahead.csvVariable Percentile7 daysVariable-7d
    percentile_variable_model_predictions_14day_ahead.csvVariable Percentile14 daysVariable-14d

  12. D

    Data from: Artificial Intelligence Prediction Across 12,000 Samples Shows...

    • researchdata.ntu.edu.sg
    csv, tsv, txt, zip
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fuying Dao; Fuying Dao (2025). Artificial Intelligence Prediction Across 12,000 Samples Shows Widespread Increased Gene-Gene Chromatin Interactions in Cancers that Constitute Therapeutic Vulnerabilities [Dataset]. http://doi.org/10.21979/N9/ORBU74
    Explore at:
    tsv(2652403), tsv(7159400), tsv(12347330), tsv(246040466), tsv(40605224), tsv(32957874), tsv(5307339), tsv(488780107), csv(197030), tsv(842382), tsv(22841142), tsv(178479924), tsv(170156168), tsv(625788), tsv(79562071), tsv(66545785), tsv(183238105), tsv(171640328), tsv(145601635), tsv(4629697), tsv(14265364), tsv(9709025), tsv(451576508), tsv(43074691), tsv(600987), tsv(22839751), tsv(7548831), tsv(472039434), tsv(2262393), tsv(5047986), tsv(4653365), tsv(84766524), tsv(13589407), tsv(13261124), tsv(14556795), tsv(4479277), tsv(39214013), tsv(152107274), tsv(2067322), tsv(12025967), tsv(14188350), tsv(37870374), tsv(1132089), tsv(3797310), tsv(37722894), tsv(4187122), tsv(4147614), tsv(350214132), tsv(4123077), tsv(4677431), tsv(116759755), tsv(471108582), tsv(2313407), tsv(274860333), tsv(5686381), txt(2173), tsv(12713251), tsv(11836560), tsv(12644048), tsv(388343301), tsv(4123872), tsv(1421075), csv(45265), tsv(4503726), tsv(6618252), tsv(13322081), csv(2148458548), zip(309076), tsv(43018125), tsv(100952054), tsv(176293158), tsv(13273525), tsv(144665998), tsv(170710561), tsv(9181020), tsv(56296731), tsv(4842328), tsv(388328744), tsv(56518303), tsv(20445114), tsv(37515424), tsv(3821902), tsv(42227286), tsv(3992282), tsv(1750893), tsv(5537984), tsv(147817825), tsv(7208091), tsv(6542114), tsv(11720494), tsv(3954880), tsv(1107678), tsv(4653235), tsv(274872850), csv(1037865091), tsv(83569260), tsv(39137013), tsv(10179510), tsv(58171411), csv(11544), tsv(32262262), tsv(4195808), csv(1943588), tsv(12102773), tsv(32262164), tsv(2915660), tsv(1531791), tsv(4173730), tsv(83492243), tsv(1662525), tsv(2091924), tsv(29096502), tsv(1312274), tsv(1566923), tsv(6452476), tsv(59304444), tsv(44312570), csv(507732533), tsv(159545757), tsv(123046981), tsv(866789), tsv(474839195), tsv(485079214), tsv(3066090), tsv(39446083), tsv(12371360), tsv(86253496), tsv(10179465), tsv(8395161), tsv(40294979), tsv(78634325), tsv(49798757), tsv(76517980), csv(517955102), tsv(86074132), tsv(4625671), tsv(12810287), tsv(27207350), tsv(10398653), tsv(4676184), csv(505865193), tsv(42071272), tsv(396706687), tsv(14652579), tsv(62341272), tsv(12445193), tsv(3882421)Available download formats
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    DR-NTU (Data)
    Authors
    Fuying Dao; Fuying Dao
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    Ministry of Education (MOE)
    National Research Foundation (NRF)
    Description

    Gene-gene chromatin interactions (GGIs) bring distal genes into close spatial proximity to permit strong co-expression, which could potentially contribute to cancer progression. High-throughput methods like Hi-C are impractical for very large cohort analyses, thus we developed AI4Loop, an Artificial Intelligence (AI) Deep Learning -based tool to predict GGIs using RNA-Seq data. Applying AI4Loop to 12,000 patient samples from the TCGA database across 32 cancer types revealed that GGIs show increased cancer sub-type predictivity compared to RNA-Seq data and demonstrated oncogenic gains of GGIs interaction in almost all cancers examined. To target the therapeutic vulnerability of gain of GGIs in cancers, using low-information RNA expression datasets from the CLUE database, we also constructed a drug-perturbation GGI atlas from 50,000 drug-treated samples to identify and repurposed compounds that disrupt oncogenic GGIs. Notably, we found that the antibiotics eperezolid and radezolid reduced cancer-acquired GGIs, which we confirmed with Hi-C experiment. This work showcases AI-directed research in epigenetics, enhances cancer biology predictivity and can promote wide-range drug repurposing in the future.

  13. 4

    Data and code underlying the paper: "Can we predict the Most Replayed data...

    • data.4tu.nl
    zip
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandro Duico; Ombretta Strafforello; Jan van Gemert (2023). Data and code underlying the paper: "Can we predict the Most Replayed data of video streaming platforms?" [Dataset]. http://doi.org/10.4121/0ca18691-3fef-4c9c-9080-12b20daae62a.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 14, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Alessandro Duico; Ombretta Strafforello; Jan van Gemert
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Predicting which specific parts of a video users will replay is important for several applications, including targeted advertisement placement on video platforms and assisting video creators. In this work, we explore whether it is possible to predict the Most Replayed (MR) data from YouTube videos. To this end, we curate a large video benchmark, the YTMR500 dataset, which comprises 500 YouTube videos with MR data annotations. We evaluate Deep Learning (DL) models of varying complexity on our dataset and perform an extensive ablation study. In addition, we conduct a user study to estimate the human performance on MR data prediction. Our results show that, although by a narrow margin, all the evaluated DL models outperform random predictions. Additionally, they exceed human-level accuracy. This suggests that predicting the MR data is a difficult task that can be enhanced through the assistance of DL. In this repository, we provide our code and dataset. The code includes our trained and tested models, our user studies and results analysis. The YTMR500 dataset is provided through an H5 file.

  14. Data from: Using Geospatial Data and Random Forest To Predict PFAS...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2023). Using Geospatial Data and Random Forest To Predict PFAS Contamination in Fish Tissue in the Columbia River Basin, United States [Dataset]. https://catalog.data.gov/dataset/using-geospatial-data-and-random-forest-to-predict-pfas-contamination-in-fish-tissue-in-th
    Explore at:
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Columbia River, Columbia River drainage basin, United States
    Description

    Publicly available data about potential PFAS sources and PFAS measurements in fish tissue. This dataset is associated with the following publication: DeLuca, N., A. Mullikin, P. Brumm, A. Rappold, and E. Hubal. Using Geospatial Data and Random Forest To Predict PFAS Contamination in Fish Tissue in the Columbia River Basin, United States. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 57: 14024-14035, (2023).

  15. Gold Data to Predict the Stock Market

    • kaggle.com
    zip
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angel Varela (2025). Gold Data to Predict the Stock Market [Dataset]. https://www.kaggle.com/datasets/angelvarela/gold-data-to-predict-the-stock-market
    Explore at:
    zip(49607779 bytes)Available download formats
    Dataset updated
    Apr 17, 2025
    Authors
    Angel Varela
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset can be used to predict the stock market. The data is extracted from MT5 terminal integrated in python.

    The datasets include the minute by minute fluctuations of Gold and Silver prices over from 1st of January 2023 to 17th April 2025. The data can be used to train models for seasonality or a minute-by-minute approach.

    The data has 7 columns:

    • Time: The minute in which the event or movement occurred (Can be converted to Datetype with pd.to_datatime)
    • Open Price
    • High Price
    • Low Price
    • Close Price (The Feature to Predict if desired)
    • Tick Volume: The Volume at the in which the event or movement occurred
    • EMA: Exponential Moving Average (Technical Estimator for Risk management in the predictions and overall better performance)
    • OBV: On-Balance Volume (Technical Estimator for Risk management in the predictions and overall better performance)

    Two datasets are used;

    Achilles Data Gold-Silver: with 1,416,340 rows to predict Gold, Silver and other Metals. Achilles Data Gold: with 708,264 rows to predict Gold, Silver and other Metals.

    You may find the paper of our implementation here: https://doi.org/10.48550/arXiv.2410.21291

  16. c

    Vibe Predict Price Prediction Data

    • coinbase.com
    Updated Oct 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Vibe Predict Price Prediction Data [Dataset]. https://www.coinbase.com/en-de/price-prediction/base-vibe-predict-ff3f
    Explore at:
    Dataset updated
    Oct 30, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset Vibe Predict over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  17. U

    Process-guided deep learning water temperature predictions: 5c All lakes...

    • data.usgs.gov
    • datasets.ai
    • +3more
    Updated Feb 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Read; Xiaowei Jia; Jared Willard; Alison Appling; Jacob Zwart; Samantha Oliver; Anuj Karpatne; Gretchen Hansen; Paul Hanson; William Watkins; Michael Steinbach; Kumar Vipin (2024). Process-guided deep learning water temperature predictions: 5c All lakes historical prediction data [Dataset]. http://doi.org/10.5066/P9AQPIVD
    Explore at:
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Jordan Read; Xiaowei Jia; Jared Willard; Alison Appling; Jacob Zwart; Samantha Oliver; Anuj Karpatne; Gretchen Hansen; Paul Hanson; William Watkins; Michael Steinbach; Kumar Vipin
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Apr 1, 1980 - Dec 31, 2018
    Description

    Multiple modeling frameworks were used to predict daily temperatures at 0.5m depth intervals for a set of diverse lakes in the U.S. states of Minnesota and Wisconsin. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Zip files for each lake contain four files, one for each of PB, PB0, DL, and PG ...

  18. n

    Data from: Simple attributes predict the value of plants as hosts to fungal...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hans Henrik Bruun; Ane Kirstine Brunbjerg; Camilla Fløjgaard; Lars Dalby; Tobias Guldberg Frøslev; Simon Haarder; Jacob Heilmann-Clausen; Toke Hoye; Thomas Læssøe; Rasmus Ejrnæs (2022). Simple attributes predict the value of plants as hosts to fungal and arthropod communities [Dataset]. http://doi.org/10.5061/dryad.hx3ffbgg8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 8, 2022
    Dataset provided by
    Independent researcher
    University of Copenhagen
    Aarhus University
    Authors
    Hans Henrik Bruun; Ane Kirstine Brunbjerg; Camilla Fløjgaard; Lars Dalby; Tobias Guldberg Frøslev; Simon Haarder; Jacob Heilmann-Clausen; Toke Hoye; Thomas Læssøe; Rasmus Ejrnæs
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Fungal and arthropod consumers constitute the vast majority of global terrestrial biodiversity. Yet, the link from richness and composition of producer (plant) communities to the richness of consumer communities is poorly understood. Fungal and arthropod species richness could be a simple function of producer species richness at a site. Alternatively, it could be a complex function of chemical and structural properties of the producer species making up communities. We used databases on plant-fungus and plant-arthropod trophic links to derive the richness of consumer biota per associated plant species (coined link score). We assessed how well link scores could be predicted by simple attributes of plant species. Next, we used a multi-taxon inventory of 130 sites, representing all major habitat types in a country (Denmark), to investigate whether link scores summed over plant species in communities (coined link sum) could outperform simple plant species richness as predictor of fungal and arthropod richness at the sites. We found plant species’ link scores for both fungi and arthropods to be positively related to plant size, regional occupancy, nativeness and ectomycorrhizal status. Link-based indices generally improved the prediction of richness of fungal and arthropod communities. For fungal communities, both observed link sum (from databases) and predicted link sum (from plant attributes) had high predictive power, while plant richness alone had none. For arthropod communities, predictive performance varied between functional groups. For both fungi and arthropods, richness predictions were further improved by considering abiotic habitat conditions. Our results underline the importance of plants as niche space for the megadiverse groups of arthropods and fungi. The plant-attribute approach holds promise for predicting local and regional consumer richness in areas of the world lacking detailed plant-consumer databases. Methods Data on plant-fungus and plant-arthropod interaction links for the 549 plant species found across the 130 BioWide sites in Denmark. Detailed descriptions of field data collection protocols are found in Brunbjerg AK, Bruun HH, Brøndum L, Classen AT, Dalby L, Fog K, et al. (2019) A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage. BMC Ecology 19(1):43. doi: 10.1186/s12898-019-0260-x. Raw data on known interaction links between all relevant plant taxa (1349 taxa on the species or genus levels) and associated arthropod species were retrieved from the BRC database (https://www.brc.ac.uk/dbif/) and similar data regarding associated fungal species from the Danish Fungal Database (https://svampe.databasen.org/). The raw data were processed to obtain an observed arthropod link score and an observed fungal link score per plant species. The calculus of link scores from raw data is detailed in the associated manuscript. Attributes of the 549 plant species used to model their predicted link score (ectomycorrhizal status, native area, occupancy in Denmark, phylogenetic grouping, lifespan, life form and size) were compiled from sources detailed in the associated manuscript.

  19. Data from: Global and Episode-Specific Prediction of Recurrent Events Using...

    • tandf.figshare.com
    pdf
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifei Sun; Sy Han Chiou; Chiung-Yu Huang (2025). Global and Episode-Specific Prediction of Recurrent Events Using Longitudinal Health Informatics Data [Dataset]. http://doi.org/10.6084/m9.figshare.28946244.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Yifei Sun; Sy Han Chiou; Chiung-Yu Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accurate prediction of recurrent clinical events is crucial for effective management of chronic conditions such as cancer and cardiovascular disease. In recent years, longitudinal health informatics databases, which routinely collect data on repeated clinical events, have been increasingly used to construct risk prediction models. We introduce a novel nonparametric framework to predict recurrent events on a gap time scale using survival tree ensembles. Our framework incorporates two predictive modeling strategies: episode-specific model and global model. These models avoid strong assumptions on how future event risk depends on previous event history and other predictors, making them a promising alternative to Cox-type models. Additional complexities in tree-based prediction for recurrent events include induced informative censoring of gap times and inter-event correlations. We develop algorithms to address these issues through the use of inverse probability of censoring weighting and modified resampling procedures. Applied to SEER-Medicare data to predict repeated hospitalizations for breast cancer patients, our models showed superior performance. In particular, borrowing information across events via global models substantially improved prediction accuracy for later hospitalizations. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

  20. Predict students' dropout and academic success

    • kaggle.com
    zip
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
    Explore at:
    zip(89332 bytes)Available download formats
    Dataset updated
    Jan 3, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Predict students' dropout and academic success

    Investigating the Impact of Social and Economic Factors

    By [source]

    About this dataset

    This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

    Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

    In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

    Research Ideas

    • Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.
    • Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.
    • Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro (2023). Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates [Dataset]. http://doi.org/10.5061/dryad.z34tmpgm7

Prediction data from: Machine learning predicts which rivers, streams, and wetlands the Clean Water Act regulates

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Dec 10, 2023
Dataset provided by
Dryad
Authors
Simon Greenhill; Hannah Druckenmiller; Sherrie Wang; David Keiser; Manuela Girotto; Jason Moore; Nobuhiro Yamaguchi; Alberto Todeschini; Joseph Shapiro
Time period covered
Sep 27, 2023
Description

This dataset contains model outputs that were analyzed to produce the main results of the paper.

Search
Clear search
Close search
Google apps
Main menu