61 datasets found
  1. Resident Assessment Instrument/Minimum Data Set (RAI/MDS)

    • catalog.data.gov
    • datahub.va.gov
    • +2more
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2025). Resident Assessment Instrument/Minimum Data Set (RAI/MDS) [Dataset]. https://catalog.data.gov/dataset/resident-assessment-instrument-minimum-data-set-rai-mds
    Explore at:
    Dataset updated
    Aug 2, 2025
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    The Resident Assessment Instrument/Minimum Data Set (RAI/MDS) is a comprehensive assessment and care planning process used by the nursing home industry since 1990 as a requirement for nursing home participation in the Medicare and Medicaid programs. The RAI/MDS provides data for monitoring changes in resident status that are consistent and reliable over time. The VA commitment to quality propelled the implementation of the RAI/MDS in its nursing homes now known as VA Community Living Centers (CLC). In addition to providing consistent clinical information, the RAI/MDS can be used as a measure of both quality and resource utilization, thereby serving as a benchmark for quality and cost data within the VA as well as with community based nursing facilities. Workload based on RAI/MDS can be calculated electronically by the interactions of the elements of the MDS data and grouped into 53 categories referred to as Resource Utilization Groups (RUG-IV). Residents are assessed quarterly. The data is grouped for analysis at the Austin Information Technology Center (AITC). Conversion to electronic data entry and transmission to the AITC was completed system-wide by year-end 2000. In 2010, the Centeres for Medicare and Medicaide Services released a significantly upgraded version, MDS 3.0, to begin to be implemented on October 1, 2011 in VHA CLCs. Training is underway currently. The MDS 3.0 will generate a new set of Quality Indicators and Quality Monitors as well the RUGs will increase to 64 RUGs from the current 53 RUG groups.

  2. Minimal dataset for multimodal deep learning

    • kaggle.com
    zip
    Updated Mar 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianbin Yao (2024). Minimal dataset for multimodal deep learning [Dataset]. https://www.kaggle.com/datasets/jianbinyao/minimum-dataset
    Explore at:
    zip(696159196 bytes)Available download formats
    Dataset updated
    Mar 26, 2024
    Authors
    Jianbin Yao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The minimum data set involved in the drought monitoring study of key growth stages of winter wheat based on multimodal deep learning, including wheat drought stress images, soil and meteorological data.

  3. h

    Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records

    • healthdatagateway.org
    • find.data.gov.scot
    • +1more
    unknown
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records [Dataset]. https://healthdatagateway.org/en/dataset/853
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    License

    https://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdfhttps://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdf

    Description

    The Mental Health and Learning Disabilities Data Set version 1 (Record Level - sensitive data inclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.

  4. f

    Minimum dataset

    • figshare.com
    application/x-rar
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    明阳 李 (2025). Minimum dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28580879.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    figshare
    Authors
    明阳 李
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Open in Matlab, click on the readme file to read and understand.

  5. h

    Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Episodes

    • healthdatagateway.org
    • find.data.gov.scot
    • +1more
    unknown
    Updated Oct 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Episodes [Dataset]. https://healthdatagateway.org/dataset/849
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    License

    https://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdfhttps://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdf

    Description

    The Mental Health and Learning Disabilities Data Set version 1 (Episode Level - sensitive data exclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.

  6. Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Records

    • find.data.gov.scot
    • dtechtive.com
    • +1more
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NHS ENGLAND (2023). Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Records [Dataset]. https://find.data.gov.scot/datasets/25923
    Explore at:
    Dataset updated
    May 31, 2023
    Dataset provided by
    National Health Servicehttps://www.nhs.uk/
    Area covered
    England, United Kingdom
    Description

    The Mental Health and Learning Disabilities Data Set version 1 (Record Level - sensitive data exclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.

  7. Mental Health and Learning Disabilities Statistics; Currency and Payment...

    • ckan.publishing.service.gov.uk
    Updated Jan 29, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). Mental Health and Learning Disabilities Statistics; Currency and Payment Data - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/mental-health-and-learning-disabilities-statistics-currency-and-payment-data
    Explore at:
    Dataset updated
    Jan 29, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The Mental Health Minimum Data Set (MHMDS) was renamed Mental Health and Learning Disabilities Data Set (MHLDDS) following an expansion in scope (from September 2014) to include people in contact with learning disability services for the first time. This monthly statistical release makes available the most recent Mental Health Minimum Dataset (MHMDS) data from April 2013 onwards. Further analysis to support currencies and payment in adult and older people's mental health services was added to the publication of April 2014 final data which can be found in the related links below. These changes are described in the Methodological Change paper referenced below. As well as providing timely data, it presents a wide range of information about care given to users of NHS-funded, secondary mental health services for adults and older people ('secondary mental health services') in England. This information will be of particular interest to organisations involved in giving secondary mental health care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHMDS Monthly Report now includes the ten nationally recommended quality and outcome indicators to support the implementation of currencies and payment in mental health. For patients, researchers, agencies and the wider public it aims to provide up to date information about the numbers of people using services, spending time in psychiatric hospitals and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis.

  8. Minimal ECG Dataset (5 Classes)

    • kaggle.com
    zip
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sidali Khelil cherfi (2025). Minimal ECG Dataset (5 Classes) [Dataset]. https://www.kaggle.com/datasets/sidalikhelilcherfi/minimal-ecg-dataset
    Explore at:
    zip(695199602 bytes)Available download formats
    Dataset updated
    Sep 26, 2025
    Authors
    sidali Khelil cherfi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a minimal, balanced subset of ECG images derived from the original PTB-XL-based dataset. It contains 400 training images and 100 testing images per class across 5 categories: CD, HYP, MI, NORM, STTC. In total, it provides 2,500 ECG images (2,000 for training and 500 for testing).

  9. g

    Mental Health and Learning Disabilities Statistics Data | gimi9.com

    • gimi9.com
    Updated Apr 20, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Mental Health and Learning Disabilities Statistics Data | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_monthly-mental-health-minimum-dataset-reports/
    Explore at:
    Dataset updated
    Apr 20, 2016
    Description

    This dataset has been discontinued and replaced with the Mental Health Services Monthly Statistics dataset, available at https://data.gov.uk/dataset/mental-health-services-monthly-statistics The Mental Health Minimum Data Set (MHMDS) was renamed Mental Health and Learning Disabilities Data Set (MHLDDS) following an expansion in scope (from September 2014) to include people in contact with learning disability services for the first time. This monthly statistical release makes available the most recent Mental Health Minimum Dataset (MHMDS) data from April 2013 onwards. Further analysis to support currencies and payment in adult and older people's mental health services was added to the publication of April 2014 final data which can be found in the related links below. These changes are described in the Methodological Change paper referenced below. As well as providing timely data, it presents a wide range of information about care given to users of NHS-funded, secondary mental health services for adults and older people ('secondary mental health services') in England. This information will be of particular interest to organisations involved in giving secondary mental health care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHMDS Monthly Report now includes the ten nationally recommended quality and outcome indicators to support the implementation of currencies and payment in mental health. For patients, researchers, agencies and the wider public it aims to provide up to date information about the numbers of people using services, spending time in psychiatric hospitals and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis.

  10. D

    Replication Data for: Generating Minimal Training Sets for Machine Learned...

    • darus.uni-stuttgart.de
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Finkbeiner; Samuel Tovey; Christian Holm (2024). Replication Data for: Generating Minimal Training Sets for Machine Learned Potentials [Dataset]. http://doi.org/10.18419/DARUS-4099
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2024
    Dataset provided by
    DaRUS
    Authors
    Jan Finkbeiner; Samuel Tovey; Christian Holm
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4099https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4099

    Dataset funded by
    DFG
    Description

    Data and scripts for replicating results and the investigation presented in the paper. This includes the dft parameters for generating training data, all training and data selection scripts for the neural networks, scripts for running and analysing the production simulations with the trained potentials.

  11. Decision Tree

    • kaggle.com
    zip
    Updated Apr 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Verma (2021). Decision Tree [Dataset]. https://www.kaggle.com/abhishekvermasg1/decision-tree
    Explore at:
    zip(1706 bytes)Available download formats
    Dataset updated
    Apr 11, 2021
    Authors
    Abhishek Verma
    Description

    Dataset

    This dataset was created by Abhishek Verma

    Contents

  12. d

    Mental Health and Learning Disabilities Statistics

    • digital.nhs.uk
    csv, pdf, xls
    Updated Dec 22, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Mental Health and Learning Disabilities Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-and-learning-disabilities-statistics
    Explore at:
    csv(13.2 kB), xls(485.4 kB), pdf(179.7 kB), pdf(578.3 kB), csv(7.2 MB), csv(2.4 MB), pdf(98.5 kB), xls(494.6 kB)Available download formats
    Dataset updated
    Dec 22, 2015
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Sep 1, 2015 - Oct 31, 2015
    Area covered
    England
    Description

    This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (September 2015). This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England. The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care - a single spell of care may include inputs from either of both types of service. The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged. This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time. For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis. The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page. This release also includes a note about the new experimental data file and the issuing of the ISN for the Mental Health Services Dataset (MHSDS). During summer 2015 we undertook a consultation on Adult Mental Health Statistics, seeking users views on the existing reports and what might usefully be added to our reports when the new version of the dataset (MHSDS) is implemented in 2016. A report on this consultation can be found below. Please note: The Monthly MHLDS Report published in February will cover November final data and December provisional data and will be the last publication from MHLDDS. Data for January 2016 will be published under the new name of Mental Health Services Monthly Statistics, with a first release of provisional data planned for March 2016. A Methodological Change paper describing changes to these monthly reports will be issued in the New Year.

  13. d

    Data from: Delaware River Basin Stream Salinity Machine Learning Models and...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Delaware River Basin Stream Salinity Machine Learning Models and Data [Dataset]. https://catalog.data.gov/dataset/delaware-river-basin-stream-salinity-machine-learning-models-and-data
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This model archive contains the input data, model code, and model outputs for machine learning models that predict daily non-tidal stream salinity (specific conductance) for a network of 459 modeled stream segments across the Delaware River Basin (DRB) from 1984-09-30 to 2021-12-31. There are a total of twelve models from combinations of two machine learning models (Random Forest and Recurrent Graph Convolution Neural Networks), two training/testing partitions (spatial and temporal), and three input attribute sets (dynamic attributes, dynamic and static attributes, and dynamic attributes and a minimum set of static attributes). In addition to the inputs and outputs for non-tidal predictions provided on the landing page, we also provide example predictions for models trained with additional tidal stream segments within the model archive (TidalExample folder), but we do not recommend our models for this use case. Model outputs contained within the model archive include performance metrics, plots of spatial and temporal errors, and Shapley (SHAP) explainable artificial intelligence plots for the best models. The results of these models provide insights into DRB stream segments with elevated salinity, and processes that drive stream salinization across the DRB, which may be used to inform salinity management. This data compilation was funded by the USGS.

  14. f

    Data from: Applying Active Learning toward Building a Generalizable Model...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xlsx
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas W. Souza; Nathan D. Ricke; Braden C. Chaffin; Mike E. Fortunato; Shutian Jiang; Cihan Soylu; Thomas C. Caya; Sii Hong Lau; Katherine A. Wieser; Abigail G. Doyle; Kian L. Tan (2025). Applying Active Learning toward Building a Generalizable Model for Ni-Photoredox Cross-Electrophile Coupling of Aryl and Alkyl Bromides [Dataset]. http://doi.org/10.1021/jacs.5c02218.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset provided by
    ACS Publications
    Authors
    Lucas W. Souza; Nathan D. Ricke; Braden C. Chaffin; Mike E. Fortunato; Shutian Jiang; Cihan Soylu; Thomas C. Caya; Sii Hong Lau; Katherine A. Wieser; Abigail G. Doyle; Kian L. Tan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    When developing machine learning models for yield prediction, the two main challenges are effectively exploring condition space and substrate space. In this article, we disclose an approach for mapping the substrate space for Ni/photoredox-catalyzed cross-electrophile coupling of alkyl bromides and aryl bromides in a high-throughput experimentation (HTE) context. This model employs active learning (in particular, uncertainty querying) as a strategy to rapidly construct a yield model. Given the vastness of substrate space, we focused on an approach that builds an initial model and then uses a minimal data set to expand into new chemical spaces. In particular, we built a model for a virtual space of 22,240 compounds using less than 400 data points. We demonstrated that the model can be expanded to 33,312 compounds by adding information around 24 building blocks (

  15. Simple Arrow Dataset

    • kaggle.com
    zip
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coder_Anand (2025). Simple Arrow Dataset [Dataset]. https://www.kaggle.com/datasets/coderanand/simple-arrow-dataset
    Explore at:
    zip(633 bytes)Available download formats
    Dataset updated
    Aug 7, 2025
    Authors
    Coder_Anand
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains 13×13 binary pixel images representing four arrow directions: left, right, up, and down. Each image is a simple black-and-white (0 and 1) arrow centered in the frame, with consistent shape and thickness across all directions.

    The dataset includes:

    4 total samples (1 per direction)

    169 binary features (pixel0 to pixel168)

    A label column indicating the direction

    This minimal dataset is ideal for:

    Testing image classification pipelines

    Teaching basic computer vision and ML concepts

    Experiments with low-resolution symbolic images

  16. Phishing URL Content Dataset

    • kaggle.com
    zip
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaditey Pillai (2024). Phishing URL Content Dataset [Dataset]. https://www.kaggle.com/datasets/aaditeypillai/phishing-website-content-dataset
    Explore at:
    zip(62701 bytes)Available download formats
    Dataset updated
    Nov 25, 2024
    Authors
    Aaditey Pillai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Phishing URL Content Dataset

    Executive Summary

    Motivation:
    Phishing attacks are one of the most significant cyber threats in today’s digital era, tricking users into divulging sensitive information like passwords, credit card numbers, and personal details. This dataset aims to support research and development of machine learning models that can classify URLs as phishing or benign.

    Applications:
    - Building robust phishing detection systems.
    - Enhancing security measures in email filtering and web browsing.
    - Training cybersecurity practitioners in identifying malicious URLs.

    The dataset contains diverse features extracted from URL structures, HTML content, and website metadata, enabling deep insights into phishing behavior patterns.

    Description of Data

    This dataset comprises two types of URLs:
    1. Phishing URLs: Malicious URLs designed to deceive users. 2. Benign URLs: Legitimate URLs posing no harm to users.

    Key Features:
    - URL-based features: Domain, protocol type (HTTP/HTTPS), and IP-based links.
    - Content-based features: Link density, iframe presence, external/internal links, and metadata.
    - Certificate-based features: SSL/TLS details like validity period and organization.
    - WHOIS data: Registration details like creation and expiration dates.

    Statistics:
    - Total Samples: 800 (400 phishing, 400 benign).
    - Features: 22 including URL, domain, link density, and SSL attributes.

    Power Analysis

    To ensure statistical reliability, a power analysis was conducted to determine the minimum sample size required for binary classification with 22 features. Using a medium effect size (0.15), alpha = 0.05, and power = 0.80, the analysis indicated a minimum sample size of ~325 per class. Our dataset exceeds this requirement with 400 examples per class, ensuring robust model training.

    Exploratory Data Analysis (EDA)

    Insights from EDA:
    - Distribution Plots: Histograms and density plots for numerical features like link density, URL length, and iframe counts. - Bar Plots: Class distribution and protocol usage trends. - Correlation Heatmap: Highlights relationships between numerical features to identify multicollinearity or strong patterns. - Box Plots: For SSL certificate validity and URL lengths, comparing phishing versus benign URLs.

    EDA visualizations are provided in the repository.

    Link to Publicly Available Data and Code

    The repository contains the Python code used to extract features, conduct EDA, and build the dataset.

    Ethics Statement

    Phishing detection datasets must balance the need for security research with the risk of misuse. This dataset:
    1. Protects User Privacy: No personally identifiable information is included.
    2. Promotes Ethical Use: Intended solely for academic and research purposes.
    3. Avoids Reinforcement of Bias: Balanced class distribution ensures fairness in training models.

    Risks:
    - Misuse of the dataset for creating more deceptive phishing attacks.
    - Over-reliance on outdated features as phishing tactics evolve.

    Researchers are encouraged to pair this dataset with continuous updates and contextual studies of real-world phishing.

    Open Source License

    This dataset is shared under the MIT License, allowing free use, modification, and distribution for academic and non-commercial purposes. License details can be found here.

  17. Data from: Dataset, Code, and Models for Training Deep Learning Potentials...

    • osti.gov
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Draney, Jack S.; Graves, David; Panagiotopoulos, Athanassios (2025). Dataset, Code, and Models for Training Deep Learning Potentials for Low Temperature Plasma-Surface Interactions [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2589045
    Explore at:
    Dataset updated
    Sep 10, 2025
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Princeton Plasma Physics Laboratoryhttp://www.pppl.gov/
    Authors
    Draney, Jack S.; Graves, David; Panagiotopoulos, Athanassios
    Description

    This repository contains datasets, training scripts, and finished models, and test simulations used in the development of DeepREBO— a machine-learned interatomic potential trained to emulate the REBO2 empirical potential. The data was generated to study deep potential development for simulations of plasma-surface interactions. It uses an active learning framework, starting from a minimal dataset and iteratively expanding it. Included are those generated datasets, the trained models, and simulations used to evaluate the performance of the training process. This resource supports reproducibility and provides a reference framework for training deep potentials in plasma-surface interaction studies.

  18. River Water Segmentation Dataset (RIWA)

    • kaggle.com
    zip
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Franz Wagner (2023). River Water Segmentation Dataset (RIWA) [Dataset]. https://www.kaggle.com/datasets/franzwagner/river-water-segmentation-dataset/code
    Explore at:
    zip(704549806 bytes)Available download formats
    Dataset updated
    Jan 26, 2023
    Authors
    Franz Wagner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    River Water Segmentation Dataset (RIWA)

    New Version 2: It is the largest high quality (min size of 400x400) dataset as far as we know (01/2023).

    The dataset called RIWA represents a pixel-wise binary river water segmentation. It consist of manually labelled smartphone, drone and DSLR images of rivers as well as suiting images of the Water Segmentation Dataset and high quality AED20K images. The COCO dataset was withdrawn since the segmentation quality is extremely poor.

    Version hisoty:

    Version 2: (declared as Version 4 by kaggle) - contains 1142 training, 167 validation and 323 test images. - Min size: 400 x 400 (h x w) - High quality segmentations. If you find an error, please message us.

    Version 1: - contains 789 training, 228 validation and 111 test images. - Min size: 174 x 200 (hxw) - Some segmentations are not perfect.

    Citation

    If you use this dataset, please cite as:

     @misc{RIWA_Dataset,
      title={River Water Segmentation Dataset (RIWA)},
      url={https://www.kaggle.com/dsv/4901781},
      DOI={10.34740/KAGGLE/DSV/4901781},
      publisher={Kaggle},
      author={Xabier Blanch and Franz Wagner and Anette Eltner},
      year={2023}
    }
    

    Contact: - Xabier Blanch, TU Dresden see at SCIENTIFIC STAFF - Franz Wagner, TU Dresden - Anette Eltner, TU Dresden

    CNN comparison

    In 2023, we carried out a comparison to find the best CNN on this domain. If you are interested, please see our paper: River water segmentation in surveillance camera images: A comparative study of offline and online augmentation using 32 CNNs.

    We conducted the tests using the AiSeg GitLab repository. It is capable of interactively train 2D and 3D CNNs, augmenting data with offline and online augmentation, analyzing single networks, comparing multiple networks, and applying trained CNNs to new data. The RIWA dataset can be used directly.

    Background:

    The handling of natural disasters, especially heavy rainfall and corresponding floods, requires special demands on emergency services. The need to obtain a quick, efficient and real-time estimation of the water level is critical for monitoring a flood event. This is a challenging task and usually requires specially prepared river sections. In addition, in heavy flood events, some classical observation methods may be compromised.

    With the technological advances derived from image-based observation methods and segmentation algorithms based on neural networks (NN), it is possible to generate real-time, low-cost monitoring systems. This new approach makes it possible to densify the observation network, improving flood warning and management. In addition, images can be obtained by remotely positioned cameras, preventing data loss during a major event.

    The workflow we have developed for real-time monitoring consists of the integration of 3 different techniques. The first step consists of a topographic survey using Structure from Motion (SfM) strategies. In this stage, images of the area of interest are obtained using both terrestrial cameras and UAV images. The survey is completed by obtaining ground control point coordinates with multi-band GNSS equipment. The result is a 3D SfM model georeferenced to centimetre accuracy that allows us to reconstruct not only the river environment but also the riverbed.

    The second step consists of segmenting the images obtained with a surveillance camera installed ad hoc to monitor the river. This segmentation is achieved with the use of convolutional neural networks (CNN). The aim is to automatically segment the time-lapse images obtained every 15 minutes. We have carried out this research by testing different CNN to choose the most suitable structure for river segmentation, adapted to each study area and at each time of the day (day and night).

    The third step is based on the integration between the automatically segmented images and the 3D model acquired. The CNN-segmented river boundary is projected into the 3D SfM model to obtain a metric result of the water level based on the point of the 3D model closest to the image ray.

    The possibility of automating the segmentation and reprojection in the 3D model will allow the generation of a robust centimetre-accurate workflow, capable of estimating the water level in near real time both day and night. This strategy represents the basis for a better understanding of river flo...

  19. d

    Data from: PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY SCORES IN ONE-CLASS BAYES LEARNING FOR ANOMALY DETECTION [Dataset]. https://catalog.data.gov/dataset/probability-calibration-by-the-minimum-and-maximum-probability-scores-in-one-class-bayes-l
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY SCORES IN ONE-CLASS BAYES LEARNING FOR ANOMALY DETECTION GUICHONG LI, NATHALIE JAPKOWICZ, IAN HOFFMAN, R. KURT UNGAR ABSTRACT. One-class Bayes learning such as one-class Naïve Bayes and one-class Bayesian Network employs Bayes learning to build a classifier on the positive class only for discriminating the positive class and the negative class. It has been applied to anomaly detection for identifying abnormal behaviors that deviate from normal behaviors. Because one-class Bayes classifiers can produce probability score, which can be used for defining anomaly score for anomaly detection, they are preferable in many practical applications as compared with other one-class learning techniques. However, previously proposed one-class Bayes classifiers might suffer from poor probability estimation when the negative training examples are unavailable. In this paper, we propose a new method to improve the probability estimation. The improved one-class Bayes classifiers can exhibits high performance as compared with previously proposed one-class Bayes classifiers according to our empirical results.

  20. Data from: Large Landing Trajectory Data Set for Go-Around Analysis

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, bin +1
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Monstein; Raphael Monstein; Benoit Figuet; Benoit Figuet; Timothé Krauth; Timothé Krauth; Manuel Waltert; Manuel Waltert; Marcel Dettling; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. http://doi.org/10.5281/zenodo.7148117
    Explore at:
    application/gzip, bin, zipAvailable download formats
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Raphael Monstein; Raphael Monstein; Benoit Figuet; Benoit Figuet; Timothé Krauth; Timothé Krauth; Manuel Waltert; Manuel Waltert; Marcel Dettling; Marcel Dettling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

    If you use this data for a scientific publication, please consider citing our paper.

    The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

    go_arounds_minimal.csv.gz

    Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

    Column nameTypeDescription
    timedate timeUTC time of landing or first GA attempt
    icao24stringUnique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    callsignstringAircraft identifier in air-ground communications
    airportstringICAO airport code where the aircraft is landing
    runwaystringRunway designator on which the aircraft landed
    has_gastring"True" if at least one GA was performed, otherwise "False"
    n_approachesintegerNumber of approaches identified for this flight
    n_rwy_approachedintegerNumber of unique runways approached by this flight

    The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

    go_arounds_augmented.csv.gz

    Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

    Column nameTypeDescription
    timedate timeUTC time of landing or first GA attempt
    icao24stringUnique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    callsignstringAircraft identifier in air-ground communications
    airportstringICAO airport code where the aircraft is landing
    runwaystringRunway designator on which the aircraft landed
    has_gastring"True" if at least one GA was performed, otherwise "False"
    n_approachesintegerNumber of approaches identified for this flight
    n_rwy_approachedintegerNumber of unique runways approached by this flight
    registrationstringAircraft registration
    typecodestringAircraft ICAO typecode
    icaoaircrafttypestringICAO aircraft type
    wtcstringICAO wake turbulence category
    glide_slope_anglefloatAngle of the ILS glide slope in degrees
    has_intersection

    string

    Boolean that is true if the runway has an other runway intersecting it, otherwise false
    rwy_lengthfloatLength of the runway in kilometre
    airport_countrystringISO Alpha-3 country code of the airport
    airport_regionstringGeographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    operator_countrystringISO Alpha-3 country code of the operator
    operator_regionstringGeographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
    wind_speed_kntsintegerMETAR, surface wind speed in knots
    wind_dir_degintegerMETAR, surface wind direction in degrees
    wind_gust_kntsintegerMETAR, surface wind gust speed in knots
    visibility_mfloatMETAR, visibility in m
    temperature_degintegerMETAR, temperature in degrees Celsius
    press_sea_level_pfloatMETAR, sea level pressure in hPa
    press_pfloatMETAR, QNH in hPA
    weather_intensitylistMETAR, list of present weather codes: qualifier - intensity
    weather_precipitationlistMETAR, list of present weather codes: weather phenomena - precipitation
    weather_desclistMETAR, list of present weather codes: qualifier - descriptor
    weather_obscurationlistMETAR, list of present weather codes: weather phenomena - obscuration
    weather_otherlistMETAR, list of present weather codes: weather phenomena - other

    This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

    go_arounds_agg.csv.gz

    Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

    Column nameTypeDescription
    airportstringICAO airport code where the aircraft is landing
    runwaystringRunway designator on which the aircraft landed
    n_landingsintegerTotal number of landings observed on this runway in 2019
    ga_ratefloatGo-around rate, per 1000 landings
    glide_slope_anglefloatAngle of the ILS glide slope in degrees
    has_intersectionstringBoolean that is true if the runway has an other runway intersecting it, otherwise false
    rwy_lengthfloatLength of the runway in kilometres
    airport_countrystringISO Alpha-3 country code of the airport
    airport_regionstringGeographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

    This aggregated data set is used in the paper for the generalized linear regression model.

    Downloading the trajectories

    Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

    import datetime
    from tqdm.auto import tqdm
    import pandas as pd
    from traffic.data import opensky
    from traffic.core import Traffic

    load minimum data set

    df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

    select London City Airport, go-arounds, and 2019-01-04

    airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

    df_selection = df.query("airport==@airport & has_ga

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Department of Veterans Affairs (2025). Resident Assessment Instrument/Minimum Data Set (RAI/MDS) [Dataset]. https://catalog.data.gov/dataset/resident-assessment-instrument-minimum-data-set-rai-mds
Organization logo

Resident Assessment Instrument/Minimum Data Set (RAI/MDS)

Explore at:
Dataset updated
Aug 2, 2025
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description

The Resident Assessment Instrument/Minimum Data Set (RAI/MDS) is a comprehensive assessment and care planning process used by the nursing home industry since 1990 as a requirement for nursing home participation in the Medicare and Medicaid programs. The RAI/MDS provides data for monitoring changes in resident status that are consistent and reliable over time. The VA commitment to quality propelled the implementation of the RAI/MDS in its nursing homes now known as VA Community Living Centers (CLC). In addition to providing consistent clinical information, the RAI/MDS can be used as a measure of both quality and resource utilization, thereby serving as a benchmark for quality and cost data within the VA as well as with community based nursing facilities. Workload based on RAI/MDS can be calculated electronically by the interactions of the elements of the MDS data and grouped into 53 categories referred to as Resource Utilization Groups (RUG-IV). Residents are assessed quarterly. The data is grouped for analysis at the Austin Information Technology Center (AITC). Conversion to electronic data entry and transmission to the AITC was completed system-wide by year-end 2000. In 2010, the Centeres for Medicare and Medicaide Services released a significantly upgraded version, MDS 3.0, to begin to be implemented on October 1, 2011 in VHA CLCs. Training is underway currently. The MDS 3.0 will generate a new set of Quality Indicators and Quality Monitors as well the RUGs will increase to 64 RUGs from the current 53 RUG groups.

Search
Clear search
Close search
Google apps
Main menu