61 datasets found

Resident Assessment Instrument/Minimum Data Set (RAI/MDS)
catalog.data.gov
datahub.va.gov
+2more
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2025). Resident Assessment Instrument/Minimum Data Set (RAI/MDS) [Dataset]. https://catalog.data.gov/dataset/resident-assessment-instrument-minimum-data-set-rai-mds
Explore at:
Dataset updated
Aug 2, 2025
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
The Resident Assessment Instrument/Minimum Data Set (RAI/MDS) is a comprehensive assessment and care planning process used by the nursing home industry since 1990 as a requirement for nursing home participation in the Medicare and Medicaid programs. The RAI/MDS provides data for monitoring changes in resident status that are consistent and reliable over time. The VA commitment to quality propelled the implementation of the RAI/MDS in its nursing homes now known as VA Community Living Centers (CLC). In addition to providing consistent clinical information, the RAI/MDS can be used as a measure of both quality and resource utilization, thereby serving as a benchmark for quality and cost data within the VA as well as with community based nursing facilities. Workload based on RAI/MDS can be calculated electronically by the interactions of the elements of the MDS data and grouped into 53 categories referred to as Resource Utilization Groups (RUG-IV). Residents are assessed quarterly. The data is grouped for analysis at the Austin Information Technology Center (AITC). Conversion to electronic data entry and transmission to the AITC was completed system-wide by year-end 2000. In 2010, the Centeres for Medicare and Medicaide Services released a significantly upgraded version, MDS 3.0, to begin to be implemented on October 1, 2011 in VHA CLCs. Training is underway currently. The MDS 3.0 will generate a new set of Quality Indicators and Quality Monitors as well the RUGs will increase to 64 RUGs from the current 53 RUG groups.
Minimal dataset for multimodal deep learning
kaggle.com
zip
Updated Mar 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianbin Yao (2024). Minimal dataset for multimodal deep learning [Dataset]. https://www.kaggle.com/datasets/jianbinyao/minimum-dataset
Explore at:
zip(696159196 bytes)Available download formats
Dataset updated
Mar 26, 2024
Authors
Jianbin Yao
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The minimum data set involved in the drought monitoring study of key growth stages of winter wheat based on multimodal deep learning, including wheat drought stress images, soil and meteorological data.
h
Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records
healthdatagateway.org
find.data.gov.scot
+1more
unknown
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records [Dataset]. https://healthdatagateway.org/en/dataset/853
Explore at:
unknownAvailable download formats
Dataset updated
Oct 8, 2024
License
https://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdfhttps://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdf
Description
The Mental Health and Learning Disabilities Data Set version 1 (Record Level - sensitive data inclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.
f
Minimum dataset
figshare.com
application/x-rar
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
明阳李 (2025). Minimum dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28580879.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28580879.v1
Dataset updated
Mar 12, 2025
Dataset provided by
figshare
Authors
明阳李
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Open in Matlab, click on the readme file to read and understand.
h
Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Episodes
healthdatagateway.org
find.data.gov.scot
+1more
unknown
Updated Oct 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Episodes [Dataset]. https://healthdatagateway.org/dataset/849
Explore at:
unknownAvailable download formats
Dataset updated
Oct 8, 2024
License
https://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdfhttps://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdf
Description
The Mental Health and Learning Disabilities Data Set version 1 (Episode Level - sensitive data exclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.
Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Records
find.data.gov.scot
dtechtive.com
+1more
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NHS ENGLAND (2023). Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Records [Dataset]. https://find.data.gov.scot/datasets/25923
Explore at:
Dataset updated
May 31, 2023
Dataset provided by
National Health Servicehttps://www.nhs.uk/
Area covered
England, United Kingdom
Description
The Mental Health and Learning Disabilities Data Set version 1 (Record Level - sensitive data exclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.
Mental Health and Learning Disabilities Statistics; Currency and Payment...
ckan.publishing.service.gov.uk
Updated Jan 29, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2015). Mental Health and Learning Disabilities Statistics; Currency and Payment Data - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/mental-health-and-learning-disabilities-statistics-currency-and-payment-data
Explore at:
Dataset updated
Jan 29, 2015
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The Mental Health Minimum Data Set (MHMDS) was renamed Mental Health and Learning Disabilities Data Set (MHLDDS) following an expansion in scope (from September 2014) to include people in contact with learning disability services for the first time. This monthly statistical release makes available the most recent Mental Health Minimum Dataset (MHMDS) data from April 2013 onwards. Further analysis to support currencies and payment in adult and older people's mental health services was added to the publication of April 2014 final data which can be found in the related links below. These changes are described in the Methodological Change paper referenced below. As well as providing timely data, it presents a wide range of information about care given to users of NHS-funded, secondary mental health services for adults and older people ('secondary mental health services') in England. This information will be of particular interest to organisations involved in giving secondary mental health care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHMDS Monthly Report now includes the ten nationally recommended quality and outcome indicators to support the implementation of currencies and payment in mental health. For patients, researchers, agencies and the wider public it aims to provide up to date information about the numbers of people using services, spending time in psychiatric hospitals and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis.
Minimal ECG Dataset (5 Classes)
kaggle.com
zip
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sidali Khelil cherfi (2025). Minimal ECG Dataset (5 Classes) [Dataset]. https://www.kaggle.com/datasets/sidalikhelilcherfi/minimal-ecg-dataset
Explore at:
zip(695199602 bytes)Available download formats
Dataset updated
Sep 26, 2025
Authors
sidali Khelil cherfi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is a minimal, balanced subset of ECG images derived from the original PTB-XL-based dataset. It contains 400 training images and 100 testing images per class across 5 categories: CD, HYP, MI, NORM, STTC. In total, it provides 2,500 ECG images (2,000 for training and 500 for testing).
g
Mental Health and Learning Disabilities Statistics Data | gimi9.com
gimi9.com
Updated Apr 20, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Mental Health and Learning Disabilities Statistics Data | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_monthly-mental-health-minimum-dataset-reports/
Explore at:
Dataset updated
Apr 20, 2016
Description
This dataset has been discontinued and replaced with the Mental Health Services Monthly Statistics dataset, available at https://data.gov.uk/dataset/mental-health-services-monthly-statistics The Mental Health Minimum Data Set (MHMDS) was renamed Mental Health and Learning Disabilities Data Set (MHLDDS) following an expansion in scope (from September 2014) to include people in contact with learning disability services for the first time. This monthly statistical release makes available the most recent Mental Health Minimum Dataset (MHMDS) data from April 2013 onwards. Further analysis to support currencies and payment in adult and older people's mental health services was added to the publication of April 2014 final data which can be found in the related links below. These changes are described in the Methodological Change paper referenced below. As well as providing timely data, it presents a wide range of information about care given to users of NHS-funded, secondary mental health services for adults and older people ('secondary mental health services') in England. This information will be of particular interest to organisations involved in giving secondary mental health care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHMDS Monthly Report now includes the ten nationally recommended quality and outcome indicators to support the implementation of currencies and payment in mental health. For patients, researchers, agencies and the wider public it aims to provide up to date information about the numbers of people using services, spending time in psychiatric hospitals and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis.
D
Replication Data for: Generating Minimal Training Sets for Machine Learned...
darus.uni-stuttgart.de
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Finkbeiner; Samuel Tovey; Christian Holm (2024). Replication Data for: Generating Minimal Training Sets for Machine Learned Potentials [Dataset]. http://doi.org/10.18419/DARUS-4099
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4099
Dataset updated
Apr 11, 2024
Dataset provided by
DaRUS
Authors
Jan Finkbeiner; Samuel Tovey; Christian Holm
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4099https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4099
Dataset funded by
DFG
Description
Data and scripts for replicating results and the investigation presented in the paper. This includes the dft parameters for generating training data, all training and data selection scripts for the neural networks, scripts for running and analysing the production simulations with the trained potentials.
Decision Tree
kaggle.com
zip
Updated Apr 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Verma (2021). Decision Tree [Dataset]. https://www.kaggle.com/abhishekvermasg1/decision-tree
Explore at:
zip(1706 bytes)Available download formats
Dataset updated
Apr 11, 2021
Authors
Abhishek Verma
Description
Dataset

This dataset was created by Abhishek Verma

Contents
d
Mental Health and Learning Disabilities Statistics
digital.nhs.uk
csv, pdf, xls
Updated Dec 22, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). Mental Health and Learning Disabilities Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-and-learning-disabilities-statistics
Explore at:
csv(13.2 kB), xls(485.4 kB), pdf(179.7 kB), pdf(578.3 kB), csv(7.2 MB), csv(2.4 MB), pdf(98.5 kB), xls(494.6 kB)Available download formats
Dataset updated
Dec 22, 2015
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Sep 1, 2015 - Oct 31, 2015
Area covered
England
Description
This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (September 2015). This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England. The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care - a single spell of care may include inputs from either of both types of service. The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged. This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time. For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis. The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page. This release also includes a note about the new experimental data file and the issuing of the ISN for the Mental Health Services Dataset (MHSDS). During summer 2015 we undertook a consultation on Adult Mental Health Statistics, seeking users views on the existing reports and what might usefully be added to our reports when the new version of the dataset (MHSDS) is implemented in 2016. A report on this consultation can be found below. Please note: The Monthly MHLDS Report published in February will cover November final data and December provisional data and will be the last publication from MHLDDS. Data for January 2016 will be published under the new name of Mental Health Services Monthly Statistics, with a first release of provisional data planned for March 2016. A Methodological Change paper describing changes to these monthly reports will be issued in the New Year.
d
Data from: Delaware River Basin Stream Salinity Machine Learning Models and...
catalog.data.gov
data.usgs.gov
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Delaware River Basin Stream Salinity Machine Learning Models and Data [Dataset]. https://catalog.data.gov/dataset/delaware-river-basin-stream-salinity-machine-learning-models-and-data
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This model archive contains the input data, model code, and model outputs for machine learning models that predict daily non-tidal stream salinity (specific conductance) for a network of 459 modeled stream segments across the Delaware River Basin (DRB) from 1984-09-30 to 2021-12-31. There are a total of twelve models from combinations of two machine learning models (Random Forest and Recurrent Graph Convolution Neural Networks), two training/testing partitions (spatial and temporal), and three input attribute sets (dynamic attributes, dynamic and static attributes, and dynamic attributes and a minimum set of static attributes). In addition to the inputs and outputs for non-tidal predictions provided on the landing page, we also provide example predictions for models trained with additional tidal stream segments within the model archive (TidalExample folder), but we do not recommend our models for this use case. Model outputs contained within the model archive include performance metrics, plots of spatial and temporal errors, and Shapley (SHAP) explainable artificial intelligence plots for the best models. The results of these models provide insights into DRB stream segments with elevated salinity, and processes that drive stream salinization across the DRB, which may be used to inform salinity management. This data compilation was funded by the USGS.
f
Data from: Applying Active Learning toward Building a Generalizable Model...
acs.figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas W. Souza; Nathan D. Ricke; Braden C. Chaffin; Mike E. Fortunato; Shutian Jiang; Cihan Soylu; Thomas C. Caya; Sii Hong Lau; Katherine A. Wieser; Abigail G. Doyle; Kian L. Tan (2025). Applying Active Learning toward Building a Generalizable Model for Ni-Photoredox Cross-Electrophile Coupling of Aryl and Alkyl Bromides [Dataset]. http://doi.org/10.1021/jacs.5c02218.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/jacs.5c02218.s001
Dataset updated
May 22, 2025
Dataset provided by
ACS Publications
Authors
Lucas W. Souza; Nathan D. Ricke; Braden C. Chaffin; Mike E. Fortunato; Shutian Jiang; Cihan Soylu; Thomas C. Caya; Sii Hong Lau; Katherine A. Wieser; Abigail G. Doyle; Kian L. Tan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
When developing machine learning models for yield prediction, the two main challenges are effectively exploring condition space and substrate space. In this article, we disclose an approach for mapping the substrate space for Ni/photoredox-catalyzed cross-electrophile coupling of alkyl bromides and aryl bromides in a high-throughput experimentation (HTE) context. This model employs active learning (in particular, uncertainty querying) as a strategy to rapidly construct a yield model. Given the vastness of substrate space, we focused on an approach that builds an initial model and then uses a minimal data set to expand into new chemical spaces. In particular, we built a model for a virtual space of 22,240 compounds using less than 400 data points. We demonstrated that the model can be expanded to 33,312 compounds by adding information around 24 building blocks (
Simple Arrow Dataset
kaggle.com
zip
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coder_Anand (2025). Simple Arrow Dataset [Dataset]. https://www.kaggle.com/datasets/coderanand/simple-arrow-dataset
Explore at:
zip(633 bytes)Available download formats
Dataset updated
Aug 7, 2025
Authors
Coder_Anand
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains 13×13 binary pixel images representing four arrow directions: left, right, up, and down. Each image is a simple black-and-white (0 and 1) arrow centered in the frame, with consistent shape and thickness across all directions.

The dataset includes:

4 total samples (1 per direction)

169 binary features (pixel0 to pixel168)

A label column indicating the direction

This minimal dataset is ideal for:

Testing image classification pipelines

Teaching basic computer vision and ML concepts

Experiments with low-resolution symbolic images
Phishing URL Content Dataset
kaggle.com
zip
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaditey Pillai (2024). Phishing URL Content Dataset [Dataset]. https://www.kaggle.com/datasets/aaditeypillai/phishing-website-content-dataset
Explore at:
zip(62701 bytes)Available download formats
Dataset updated
Nov 25, 2024
Authors
Aaditey Pillai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Phishing URL Content Dataset

Executive Summary

Motivation:
Phishing attacks are one of the most significant cyber threats in today’s digital era, tricking users into divulging sensitive information like passwords, credit card numbers, and personal details. This dataset aims to support research and development of machine learning models that can classify URLs as phishing or benign.

Applications:
- Building robust phishing detection systems.
- Enhancing security measures in email filtering and web browsing.
- Training cybersecurity practitioners in identifying malicious URLs.

The dataset contains diverse features extracted from URL structures, HTML content, and website metadata, enabling deep insights into phishing behavior patterns.

Description of Data

This dataset comprises two types of URLs:
1. Phishing URLs: Malicious URLs designed to deceive users. 2. Benign URLs: Legitimate URLs posing no harm to users.

Key Features:
- URL-based features: Domain, protocol type (HTTP/HTTPS), and IP-based links.
- Content-based features: Link density, iframe presence, external/internal links, and metadata.
- Certificate-based features: SSL/TLS details like validity period and organization.
- WHOIS data: Registration details like creation and expiration dates.

Statistics:
- Total Samples: 800 (400 phishing, 400 benign).
- Features: 22 including URL, domain, link density, and SSL attributes.

Power Analysis

To ensure statistical reliability, a power analysis was conducted to determine the minimum sample size required for binary classification with 22 features. Using a medium effect size (0.15), alpha = 0.05, and power = 0.80, the analysis indicated a minimum sample size of ~325 per class. Our dataset exceeds this requirement with 400 examples per class, ensuring robust model training.

Exploratory Data Analysis (EDA)

Insights from EDA:
- Distribution Plots: Histograms and density plots for numerical features like link density, URL length, and iframe counts. - Bar Plots: Class distribution and protocol usage trends. - Correlation Heatmap: Highlights relationships between numerical features to identify multicollinearity or strong patterns. - Box Plots: For SSL certificate validity and URL lengths, comparing phishing versus benign URLs.

EDA visualizations are provided in the repository.

Link to Publicly Available Data and Code

Dataset: Phishing URL Dataset

Code Repository: GitHub - Phishing Detection

The repository contains the Python code used to extract features, conduct EDA, and build the dataset.

Ethics Statement

Phishing detection datasets must balance the need for security research with the risk of misuse. This dataset:
1. Protects User Privacy: No personally identifiable information is included.
2. Promotes Ethical Use: Intended solely for academic and research purposes.
3. Avoids Reinforcement of Bias: Balanced class distribution ensures fairness in training models.

Risks:
- Misuse of the dataset for creating more deceptive phishing attacks.
- Over-reliance on outdated features as phishing tactics evolve.

Researchers are encouraged to pair this dataset with continuous updates and contextual studies of real-world phishing.

Open Source License

This dataset is shared under the MIT License, allowing free use, modification, and distribution for academic and non-commercial purposes. License details can be found here.
Data from: Dataset, Code, and Models for Training Deep Learning Potentials...
osti.gov
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Draney, Jack S.; Graves, David; Panagiotopoulos, Athanassios (2025). Dataset, Code, and Models for Training Deep Learning Potentials for Low Temperature Plasma-Surface Interactions [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2589045
Explore at:
Dataset updated
Sep 10, 2025
Dataset provided by
United States Department of Energyhttp://energy.gov/
Princeton Plasma Physics Laboratoryhttp://www.pppl.gov/
Authors
Draney, Jack S.; Graves, David; Panagiotopoulos, Athanassios
Description
This repository contains datasets, training scripts, and finished models, and test simulations used in the development of DeepREBO— a machine-learned interatomic potential trained to emulate the REBO2 empirical potential. The data was generated to study deep potential development for simulations of plasma-surface interactions. It uses an active learning framework, starting from a minimal dataset and iteratively expanding it. Included are those generated datasets, the trained models, and simulations used to evaluate the performance of the training process. This resource supports reproducibility and provides a reference framework for training deep potentials in plasma-surface interaction studies.
River Water Segmentation Dataset (RIWA)
kaggle.com
zip
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Franz Wagner (2023). River Water Segmentation Dataset (RIWA) [Dataset]. https://www.kaggle.com/datasets/franzwagner/river-water-segmentation-dataset/code
Explore at:
zip(704549806 bytes)Available download formats
Dataset updated
Jan 26, 2023
Authors
Franz Wagner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
River Water Segmentation Dataset (RIWA)

New Version 2: It is the largest high quality (min size of 400x400) dataset as far as we know (01/2023).

The dataset called RIWA represents a pixel-wise binary river water segmentation. It consist of manually labelled smartphone, drone and DSLR images of rivers as well as suiting images of the Water Segmentation Dataset and high quality AED20K images. The COCO dataset was withdrawn since the segmentation quality is extremely poor.

Version hisoty:

Version 2: (declared as Version 4 by kaggle) - contains 1142 training, 167 validation and 323 test images. - Min size: 400 x 400 (h x w) - High quality segmentations. If you find an error, please message us.

Version 1: - contains 789 training, 228 validation and 111 test images. - Min size: 174 x 200 (hxw) - Some segmentations are not perfect.

Citation

If you use this dataset, please cite as:

@misc{RIWA_Dataset, title={River Water Segmentation Dataset (RIWA)}, url={https://www.kaggle.com/dsv/4901781}, DOI={10.34740/KAGGLE/DSV/4901781}, publisher={Kaggle}, author={Xabier Blanch and Franz Wagner and Anette Eltner}, year={2023} }

Contact: - Xabier Blanch, TU Dresden see at SCIENTIFIC STAFF - Franz Wagner, TU Dresden - Anette Eltner, TU Dresden

CNN comparison

In 2023, we carried out a comparison to find the best CNN on this domain. If you are interested, please see our paper: River water segmentation in surveillance camera images: A comparative study of offline and online augmentation using 32 CNNs.

We conducted the tests using the AiSeg GitLab repository. It is capable of interactively train 2D and 3D CNNs, augmenting data with offline and online augmentation, analyzing single networks, comparing multiple networks, and applying trained CNNs to new data. The RIWA dataset can be used directly.

Background:

The handling of natural disasters, especially heavy rainfall and corresponding floods, requires special demands on emergency services. The need to obtain a quick, efficient and real-time estimation of the water level is critical for monitoring a flood event. This is a challenging task and usually requires specially prepared river sections. In addition, in heavy flood events, some classical observation methods may be compromised.

With the technological advances derived from image-based observation methods and segmentation algorithms based on neural networks (NN), it is possible to generate real-time, low-cost monitoring systems. This new approach makes it possible to densify the observation network, improving flood warning and management. In addition, images can be obtained by remotely positioned cameras, preventing data loss during a major event.

The workflow we have developed for real-time monitoring consists of the integration of 3 different techniques. The first step consists of a topographic survey using Structure from Motion (SfM) strategies. In this stage, images of the area of interest are obtained using both terrestrial cameras and UAV images. The survey is completed by obtaining ground control point coordinates with multi-band GNSS equipment. The result is a 3D SfM model georeferenced to centimetre accuracy that allows us to reconstruct not only the river environment but also the riverbed.

The second step consists of segmenting the images obtained with a surveillance camera installed ad hoc to monitor the river. This segmentation is achieved with the use of convolutional neural networks (CNN). The aim is to automatically segment the time-lapse images obtained every 15 minutes. We have carried out this research by testing different CNN to choose the most suitable structure for river segmentation, adapted to each study area and at each time of the day (day and night).

The third step is based on the integration between the automatically segmented images and the 3D model acquired. The CNN-segmented river boundary is projected into the 3D SfM model to obtain a metric result of the water level based on the point of the 3D model closest to the image ray.

The possibility of automating the segmentation and reprojection in the 3D model will allow the generation of a robust centimetre-accurate workflow, capable of estimating the water level in near real time both day and night. This strategy represents the basis for a better understanding of river flo...
d
Data from: PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY...
catalog.data.gov
datasets.ai
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY SCORES IN ONE-CLASS BAYES LEARNING FOR ANOMALY DETECTION [Dataset]. https://catalog.data.gov/dataset/probability-calibration-by-the-minimum-and-maximum-probability-scores-in-one-class-bayes-l
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY SCORES IN ONE-CLASS BAYES LEARNING FOR ANOMALY DETECTION GUICHONG LI, NATHALIE JAPKOWICZ, IAN HOFFMAN, R. KURT UNGAR ABSTRACT. One-class Bayes learning such as one-class Naïve Bayes and one-class Bayesian Network employs Bayes learning to build a classifier on the positive class only for discriminating the positive class and the negative class. It has been applied to anomaly detection for identifying abnormal behaviors that deviate from normal behaviors. Because one-class Bayes classifiers can produce probability score, which can be used for defining anomaly score for anomaly detection, they are preferable in many practical applications as compared with other one-class learning techniques. However, previously proposed one-class Bayes classifiers might suffer from poor probability estimation when the negative training examples are unavailable. In this paper, we propose a new method to improve the probability estimation. The improved one-class Bayes classifiers can exhibits high performance as compared with previously proposed one-class Bayes classifiers according to our empirical results.

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

zenodo.org
data.niaid.nih.gov
+1more

application/gzip, bin +1

Updated Dec 16, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Raphael Monstein; Raphael Monstein; Benoit Figuet; Benoit Figuet; Timothé Krauth; Timothé Krauth; Manuel Waltert; Manuel Waltert; Marcel Dettling; Marcel Dettling (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. http://doi.org/10.5281/zenodo.7148117

Explore at:

application/gzip, bin, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7148117

Dataset updated

Dec 16, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Raphael Monstein; Raphael Monstein; Benoit Figuet; Benoit Figuet; Timothé Krauth; Timothé Krauth; Manuel Waltert; Manuel Waltert; Marcel Dettling; Marcel Dettling

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

If you use this data for a scientific publication, please consider citing our paper.

The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

go_arounds_minimal.csv.gz

Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:


Column name	Type	Description
time	date time	UTC time of landing or first GA attempt
icao24	string	Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign	string	Aircraft identifier in air-ground communications
airport	string	ICAO airport code where the aircraft is landing
runway	string	Runway designator on which the aircraft landed
has_ga	string	"True" if at least one GA was performed, otherwise "False"
n_approaches	integer	Number of approaches identified for this flight
n_rwy_approached	integer	Number of unique runways approached by this flight

The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

go_arounds_augmented.csv.gz

Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

Column name	Type	Description
time	date time	UTC time of landing or first GA attempt
icao24	string	Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign	string	Aircraft identifier in air-ground communications
airport	string	ICAO airport code where the aircraft is landing
runway	string	Runway designator on which the aircraft landed
has_ga	string	"True" if at least one GA was performed, otherwise "False"
n_approaches	integer	Number of approaches identified for this flight
n_rwy_approached	integer	Number of unique runways approached by this flight
registration	string	Aircraft registration
typecode	string	Aircraft ICAO typecode
icaoaircrafttype	string	ICAO aircraft type
wtc	string	ICAO wake turbulence category
glide_slope_angle	float	Angle of the ILS glide slope in degrees
has_intersection	string	Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length	float	Length of the runway in kilometre
airport_country	string	ISO Alpha-3 country code of the airport
airport_region	string	Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
operator_country	string	ISO Alpha-3 country code of the operator
operator_region	string	Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
wind_speed_knts	integer	METAR, surface wind speed in knots
wind_dir_deg	integer	METAR, surface wind direction in degrees
wind_gust_knts	integer	METAR, surface wind gust speed in knots
visibility_m	float	METAR, visibility in m
temperature_deg	integer	METAR, temperature in degrees Celsius
press_sea_level_p	float	METAR, sea level pressure in hPa
press_p	float	METAR, QNH in hPA
weather_intensity	list	METAR, list of present weather codes: qualifier - intensity
weather_precipitation	list	METAR, list of present weather codes: weather phenomena - precipitation
weather_desc	list	METAR, list of present weather codes: qualifier - descriptor
weather_obscuration	list	METAR, list of present weather codes: weather phenomena - obscuration
weather_other	list	METAR, list of present weather codes: weather phenomena - other

This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

go_arounds_agg.csv.gz

Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

Column name	Type	Description
airport	string	ICAO airport code where the aircraft is landing
runway	string	Runway designator on which the aircraft landed
n_landings	integer	Total number of landings observed on this runway in 2019
ga_rate	float	Go-around rate, per 1000 landings
glide_slope_angle	float	Angle of the ILS glide slope in degrees
has_intersection	string	Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length	float	Length of the runway in kilometres
airport_country	string	ISO Alpha-3 country code of the airport
airport_region	string	Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)

This aggregated data set is used in the paper for the generalized linear regression model.

Downloading the trajectories

Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

import datetime
from tqdm.auto import tqdm
import pandas as pd
from traffic.data import opensky
from traffic.core import Traffic

load minimum data set

df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False)
df["time"] = pd.to_datetime(df["time"])

select London City Airport, go-arounds, and 2019-01-04

airport = "EGLC"
start = datetime.datetime(year=2019, month=1, day=4).replace(
  tzinfo=datetime.timezone.utc
)
stop = datetime.datetime(year=2019, month=1, day=5).replace(
  tzinfo=datetime.timezone.utc
)

df_selection = df.query("airport==@airport & has_ga

Facebook

Twitter

Click to copy link

Link copied

Cite

Department of Veterans Affairs (2025). Resident Assessment Instrument/Minimum Data Set (RAI/MDS) [Dataset]. https://catalog.data.gov/dataset/resident-assessment-instrument-minimum-data-set-rai-mds

Resident Assessment Instrument/Minimum Data Set (RAI/MDS)

Explore at:

Dataset updated

Aug 2, 2025

Dataset provided by

United States Department of Veterans Affairshttp://va.gov/

Description

The Resident Assessment Instrument/Minimum Data Set (RAI/MDS) is a comprehensive assessment and care planning process used by the nursing home industry since 1990 as a requirement for nursing home participation in the Medicare and Medicaid programs. The RAI/MDS provides data for monitoring changes in resident status that are consistent and reliable over time. The VA commitment to quality propelled the implementation of the RAI/MDS in its nursing homes now known as VA Community Living Centers (CLC). In addition to providing consistent clinical information, the RAI/MDS can be used as a measure of both quality and resource utilization, thereby serving as a benchmark for quality and cost data within the VA as well as with community based nursing facilities. Workload based on RAI/MDS can be calculated electronically by the interactions of the elements of the MDS data and grouped into 53 categories referred to as Resource Utilization Groups (RUG-IV). Residents are assessed quarterly. The data is grouped for analysis at the Austin Information Technology Center (AITC). Conversion to electronic data entry and transmission to the AITC was completed system-wide by year-end 2000. In 2010, the Centeres for Medicare and Medicaide Services released a significantly upgraded version, MDS 3.0, to begin to be implemented on October 1, 2011 in VHA CLCs. Training is underway currently. The MDS 3.0 will generate a new set of Quality Indicators and Quality Monitors as well the RUGs will increase to 64 RUGs from the current 53 RUG groups.

Clear search

Close search

Google apps

Main menu

Resident Assessment Instrument/Minimum Data Set (RAI/MDS)

Minimal dataset for multimodal deep learning

Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records

Minimum dataset

Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Episodes

Mental Health & Learning Disabilities Dataset v 1 (Non-Sensitive) Records

Mental Health and Learning Disabilities Statistics; Currency and Payment...

Minimal ECG Dataset (5 Classes)

Mental Health and Learning Disabilities Statistics Data | gimi9.com

Replication Data for: Generating Minimal Training Sets for Machine Learned...

Decision Tree

Dataset

Contents

Mental Health and Learning Disabilities Statistics

Data from: Delaware River Basin Stream Salinity Machine Learning Models and...

Data from: Applying Active Learning toward Building a Generalizable Model...

Simple Arrow Dataset

Phishing URL Content Dataset

Phishing URL Content Dataset

Executive Summary

Description of Data

Power Analysis

Exploratory Data Analysis (EDA)

Link to Publicly Available Data and Code

Ethics Statement

Open Source License

Data from: Dataset, Code, and Models for Training Deep Learning Potentials...

River Water Segmentation Dataset (RIWA)

River Water Segmentation Dataset (RIWA)

Version hisoty:

Citation

CNN comparison

Background:

Data from: PROBABILITY CALIBRATION BY THE MINIMUM AND MAXIMUM PROBABILITY...

Data from: Large Landing Trajectory Data Set for Go-Around Analysis

load minimum data set

select London City Airport, go-arounds, and 2019-01-04

Resident Assessment Instrument/Minimum Data Set (RAI/MDS)