9 datasets found

f
Preprocessing steps.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyeong Jun Ahn; Kyle Ishikawa; Min-Hee Kim (2024). Preprocessing steps. [Dataset]. http://doi.org/10.1371/journal.pone.0304785.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304785.t001
Dataset updated
Jun 28, 2024
Dataset provided by
PLOS ONE
Authors
Hyeong Jun Ahn; Kyle Ishikawa; Min-Hee Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this study, we employed various machine learning models to predict metabolic phenotypes, focusing on thyroid function, using a dataset from the National Health and Nutrition Examination Survey (NHANES) from 2007 to 2012. Our analysis utilized laboratory parameters relevant to thyroid function or metabolic dysregulation in addition to demographic features, aiming to uncover potential associations between thyroid function and metabolic phenotypes by various machine learning methods. Multinomial Logistic Regression performed best to identify the relationship between thyroid function and metabolic phenotypes, achieving an area under receiver operating characteristic curve (AUROC) of 0.818, followed closely by Neural Network (AUROC: 0.814). Following the above, the performance of Random Forest, Boosted Trees, and K Nearest Neighbors was inferior to the first two methods (AUROC 0.811, 0.811, and 0.786, respectively). In Random Forest, homeostatic model assessment for insulin resistance, serum uric acid, serum albumin, gamma glutamyl transferase, and triiodothyronine/thyroxine ratio were positioned in the upper ranks of variable importance. These results highlight the potential of machine learning in understanding complex relationships in health data. However, it’s important to note that model performance may vary depending on data characteristics and specific requirements. Furthermore, we emphasize the significance of accounting for sampling weights in complex survey data analysis and the potential benefits of incorporating additional variables to enhance model accuracy and insights. Future research can explore advanced methodologies combining machine learning, sample weights, and expanded variable sets to further advance survey data analysis.
f
Data Sheet 7_Prediction of outpatient rehabilitation patient preferences and...
frontiersin.figshare.com
docx
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xuehui Fan; Ruixue Ye; Yan Gao; Kaiwen Xue; Zeyu Zhang; Jing Xu; Jingpu Zhao; Jun Feng; Yulong Wang (2025). Data Sheet 7_Prediction of outpatient rehabilitation patient preferences and optimization of graded diagnosis and treatment based on XGBoost machine learning algorithm.docx [Dataset]. http://doi.org/10.3389/frai.2024.1473837.s008
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2024.1473837.s008
Dataset updated
Jan 15, 2025
Dataset provided by
Frontiers
Authors
Xuehui Fan; Ruixue Ye; Yan Gao; Kaiwen Xue; Zeyu Zhang; Jing Xu; Jingpu Zhao; Jun Feng; Yulong Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe Department of Rehabilitation Medicine is key to improving patients’ quality of life. Driven by chronic diseases and an aging population, there is a need to enhance the efficiency and resource allocation of outpatient facilities. This study aims to analyze the treatment preferences of outpatient rehabilitation patients by using data and a grading tool to establish predictive models. The goal is to improve patient visit efficiency and optimize resource allocation through these predictive models.MethodsData were collected from 38 Chinese institutions, including 4,244 patients visiting outpatient rehabilitation clinics. Data processing was conducted using Python software. The pandas library was used for data cleaning and preprocessing, involving 68 categorical and 12 continuous variables. The steps included handling missing values, data normalization, and encoding conversion. The data were divided into 80% training and 20% test sets using the Scikit-learn library to ensure model independence and prevent overfitting. Performance comparisons among XGBoost, random forest, and logistic regression were conducted using metrics, including accuracy and receiver operating characteristic (ROC) curves. The imbalanced learning library’s SMOTE technique was used to address the sample imbalance during model training. The model was optimized using a confusion matrix and feature importance analysis, and partial dependence plots (PDP) were used to analyze the key influencing factors.ResultsXGBoost achieved the highest overall accuracy of 80.21% with high precision and recall in Category 1. random forest showed a similar overall accuracy. Logistic Regression had a significantly lower accuracy, indicating difficulties with nonlinear data. The key influencing factors identified include distance to medical institutions, arrival time, length of hospital stay, and specific diseases, such as cardiovascular, pulmonary, oncological, and orthopedic conditions. The tiered diagnosis and treatment tool effectively helped doctors assess patients’ conditions and recommend suitable medical institutions based on rehabilitation grading.ConclusionThis study confirmed that ensemble learning methods, particularly XGBoost, outperform single models in classification tasks involving complex datasets. Addressing class imbalance and enhancing feature engineering can further improve model performance. Understanding patient preferences and the factors influencing medical institution selection can guide healthcare policies to optimize resource allocation, improve service quality, and enhance patient satisfaction. Tiered diagnosis and treatment tools play a crucial role in helping doctors evaluate patient conditions and make informed recommendations for appropriate medical care.
f
BCI competition III dataset 4a classification accuracy (%) with different...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Sep 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana (2023). BCI competition III dataset 4a classification accuracy (%) with different classifiers. [Dataset]. http://doi.org/10.1371/journal.pone.0276133.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276133.t003
Dataset updated
Sep 8, 2023
Dataset provided by
PLOS ONE
Authors
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BCI competition III dataset 4a classification accuracy (%) with different classifiers.
f
Optimized hyper-parameters from subject ‘a’ of dataset 1.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana (2023). Optimized hyper-parameters from subject ‘a’ of dataset 1. [Dataset]. http://doi.org/10.1371/journal.pone.0276133.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276133.t001
Dataset updated
Sep 8, 2023
Dataset provided by
PLOS ONE
Authors
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Optimized hyper-parameters from subject ‘a’ of dataset 1.
f
Table_1_Machine Learning in Modeling of Mouse Behavior.PDF
frontiersin.figshare.com
pdf
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marjan Gharagozloo; Abdelaziz Amrani; Kevin Wittingstall; Andrew Hamilton-Wright; Denis Gris (2023). Table_1_Machine Learning in Modeling of Mouse Behavior.PDF [Dataset]. http://doi.org/10.3389/fnins.2021.700253.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnins.2021.700253.s002
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Marjan Gharagozloo; Abdelaziz Amrani; Kevin Wittingstall; Andrew Hamilton-Wright; Denis Gris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mouse behavior is a primary outcome in evaluations of therapeutic efficacy. Exhaustive, continuous, multiparametric behavioral phenotyping is a valuable tool for understanding the pathophysiological status of mouse brain diseases. Automated home cage behavior analysis produces highly granulated data both in terms of number of features and sampling frequency. Previously, we demonstrated several ways to reduce feature dimensionality. In this study, we propose novel approaches for analyzing 33-Hz data generated by CleverSys software. We hypothesized that behavioral patterns within short time windows are reflective of physiological state, and that computer modeling of mouse behavioral routines can serve as a predictive tool in classification tasks. To remove bias due to researcher decisions, our data flow is indifferent to the quality, value, and importance of any given feature in isolation. To classify day and night behavior, as an example application, we developed a data preprocessing flow and utilized logistic regression (LG), support vector machines (SVM), random forest (RF), and one-dimensional convolutional neural networks paired with long short-term memory deep neural networks (1DConvBiLSTM). We determined that a 5-min video clip is sufficient to classify mouse behavior with high accuracy. LG, SVM, and RF performed similarly, predicting mouse behavior with 85% accuracy, and combining the three algorithms in an ensemble procedure increased accuracy to 90%. The best performance was achieved by combining the 1DConv and BiLSTM algorithms yielding 96% accuracy. Our findings demonstrate that computer modeling of the home-cage ethome can clearly define mouse physiological state. Furthermore, we showed that continuous behavioral data can be analyzed using approaches similar to natural language processing. These data provide proof of concept for future research in diagnostics of complex pathophysiological changes that are accompanied by changes in behavioral profile.
f
Data_Sheet_1_Machine Learning Prediction Models for Mechanically Ventilated...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated Jun 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yibing Zhu; Jin Zhang; Guowei Wang; Renqi Yao; Chao Ren; Ge Chen; Xin Jin; Junyang Guo; Shi Liu; Hua Zheng; Yan Chen; Qianqian Guo; Lin Li; Bin Du; Xiuming Xi; Wei Li; Huibin Huang; Yang Li; Qian Yu (2023). Data_Sheet_1_Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.pdf [Dataset]. http://doi.org/10.3389/fmed.2021.662340.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2021.662340.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers
Authors
Yibing Zhu; Jin Zhang; Guowei Wang; Renqi Yao; Chao Ren; Ge Chen; Xin Jin; Junyang Guo; Shi Liu; Hua Zheng; Yan Chen; Qianqian Guo; Lin Li; Bin Du; Xiuming Xi; Wei Li; Huibin Huang; Yang Li; Qian Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission.Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported.Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate.Conclusion: The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.
f
BCI competition IV dataset 1 classification accuracy (%) of dataset 1 with...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Sep 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana (2023). BCI competition IV dataset 1 classification accuracy (%) of dataset 1 with proposed method compared with other methodologies. [Dataset]. http://doi.org/10.1371/journal.pone.0276133.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276133.t004
Dataset updated
Sep 8, 2023
Dataset provided by
PLOS ONE
Authors
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BCI competition IV dataset 1 classification accuracy (%) of dataset 1 with proposed method compared with other methodologies.
This is the code file which can be utilized to generate results shown in the...
plos.figshare.com
datasetcatalog.nlm.nih.gov
application/x-rar
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana (2023). This is the code file which can be utilized to generate results shown in the research paper. [Dataset]. http://doi.org/10.1371/journal.pone.0276133.s001
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276133.s001
Dataset updated
Sep 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rabia Avais Khan; Nasir Rashid; Muhammad Shahzaib; Umar Farooq Malik; Arshia Arif; Javaid Iqbal; Mubasher Saleem; Umar Shahbaz Khan; Mohsin Tiwana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the code file which can be utilized to generate results shown in the research paper.
f
Statistical report containing details of all data pre-processing steps to...
plos.figshare.com
html
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Barrett-Jolley; Alexander J. German (2024). Statistical report containing details of all data pre-processing steps to create the dataset for all owners. [Dataset]. http://doi.org/10.1371/journal.pone.0280173.s016
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0280173.s016
Dataset updated
May 15, 2024
Dataset provided by
PLOS ONE
Authors
Richard Barrett-Jolley; Alexander J. German
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical report containing details of all data pre-processing steps to create the dataset for all owners.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hyeong Jun Ahn; Kyle Ishikawa; Min-Hee Kim (2024). Preprocessing steps. [Dataset]. http://doi.org/10.1371/journal.pone.0304785.t001

Preprocessing steps.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0304785.t001

Dataset updated

Jun 28, 2024

Dataset provided by

PLOS ONE

Authors

Hyeong Jun Ahn; Kyle Ishikawa; Min-Hee Kim

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In this study, we employed various machine learning models to predict metabolic phenotypes, focusing on thyroid function, using a dataset from the National Health and Nutrition Examination Survey (NHANES) from 2007 to 2012. Our analysis utilized laboratory parameters relevant to thyroid function or metabolic dysregulation in addition to demographic features, aiming to uncover potential associations between thyroid function and metabolic phenotypes by various machine learning methods. Multinomial Logistic Regression performed best to identify the relationship between thyroid function and metabolic phenotypes, achieving an area under receiver operating characteristic curve (AUROC) of 0.818, followed closely by Neural Network (AUROC: 0.814). Following the above, the performance of Random Forest, Boosted Trees, and K Nearest Neighbors was inferior to the first two methods (AUROC 0.811, 0.811, and 0.786, respectively). In Random Forest, homeostatic model assessment for insulin resistance, serum uric acid, serum albumin, gamma glutamyl transferase, and triiodothyronine/thyroxine ratio were positioned in the upper ranks of variable importance. These results highlight the potential of machine learning in understanding complex relationships in health data. However, it’s important to note that model performance may vary depending on data characteristics and specific requirements. Furthermore, we emphasize the significance of accounting for sampling weights in complex survey data analysis and the potential benefits of incorporating additional variables to enhance model accuracy and insights. Future research can explore advanced methodologies combining machine learning, sample weights, and expanded variable sets to further advance survey data analysis.

Clear search

Close search

Google apps

Main menu

Preprocessing steps.

Data Sheet 7_Prediction of outpatient rehabilitation patient preferences and...

BCI competition III dataset 4a classification accuracy (%) with different...

Optimized hyper-parameters from subject ‘a’ of dataset 1.

Table_1_Machine Learning in Modeling of Mouse Behavior.PDF

Data_Sheet_1_Machine Learning Prediction Models for Mechanically Ventilated...

BCI competition IV dataset 1 classification accuracy (%) of dataset 1 with...

This is the code file which can be utilized to generate results shown in the...

Statistical report containing details of all data pre-processing steps to...

Preprocessing steps.See More Versions

Preprocessing steps.