9 datasets found
  1. f

    Table_1_Machine learning-based warning model for chronic kidney disease in...

    • frontiersin.figshare.com
    docx
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenzhu Song; Yanfeng Liu; Lixia Qiu; Jianbo Qing; Aizhong Li; Yan Zhao; Yafeng Li; Rongshan Li; Xiaoshuang Zhou (2023). Table_1_Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province.DOCX [Dataset]. http://doi.org/10.3389/fmed.2022.930541.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    Frontiers
    Authors
    Wenzhu Song; Yanfeng Liu; Lixia Qiu; Jianbo Qing; Aizhong Li; Yan Zhao; Yafeng Li; Rongshan Li; Xiaoshuang Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionChronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China’s rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients’ quality of life.MethodsFrom April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively.Results12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD.ConclusionML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.

  2. S

    Systematic analysis and modeling of the FLASH sparing effect as a function...

    • scidb.cn
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qibin FU; Tuchen HUANG (2024). Systematic analysis and modeling of the FLASH sparing effect as a function of dose rate and dose [Dataset]. http://doi.org/10.57760/sciencedb.j00186.00150
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Qibin FU; Tuchen HUANG
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Online searches through Web of Science and PubMed were conducted on 15 September, 2023 for articles published after 1950 using the following terms: TS = (ultra high dose rate OR ultra-high dose rate OR ultrahigh dose rate) AND TS = (in vivo OR animal model OR mice OR preclinical). The queries produced 980 results in total, with 564 results left after removing duplicate entries.The titles and abstracts were reviewed manually by two authors and the full-text of suitable manuscripts was further screened considering the factors such as topics, experiment condition and methods, research objects, endpoints, etc. The detailed record identification and screening flows based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) are summarized in Figure 1. Finally, forty articles were included in our analysis.The FLASH effect was confirmed if there were significant differences in experimental phenomena and data under the two radiation conditions. In the same article, the research items with different endpoints but otherwise identical conditions were regarded as one item. As summarized in Table 1, a total of 131 items were extracted from the 40 articles included in the analysis. For each item, the FLASH effect (1 represents significant sparing effect and 0 represents no sparing effect) and detailed parameters were recorded, including type and energy of the radiation, dose, dose rate, experimental object, pulse characteristics (if provided), etc.According to emulate the quantitative analyses of normal tissue effect in the clinic (QUANTEC), the probability of triggering the FLASH effect as a function of mean dose rate or dose was analyzed with the binary logistic regression model. The analysis was done using the SPSS software. For the statistical data items, there are large imbalances in the number of data entries with and without FLASH effect (people are more inclined to report the research with positive results). Therefore, a more balanced dataset was obtained by oversampling using the K-Means SMOTE algorithm (Figure S1), which was implemented using Python based on the imblearn library.The ROC curve (receiver operating characteristic curve) was plotted as FPR (False Positive Rate) against TPR (True Positive Rate) at different threshold values. The classification model was validated using the AUC (area under ROC curve) value, which was threshold and scale invariant.

  3. f

    Data from: Dataset description.

    • plos.figshare.com
    xls
    Updated Apr 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongge Niu; Renxin Ru; Jiasheng Zhang; Yibo Zhang; Cheng Ding; Yao Lan (2025). Dataset description. [Dataset]. http://doi.org/10.1371/journal.pone.0320299.t012
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Dongge Niu; Renxin Ru; Jiasheng Zhang; Yibo Zhang; Cheng Ding; Yao Lan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anesthesia plays a pivotal role in modern surgery by facilitating controlled states of unconsciousness. Precise control is crucial for safe and pain-free surgeries. Monitoring anesthesia depth accurately is essential to guide anesthesiologists, optimize drug usage, and mitigate postoperative complications. This study focuses on enhancing the classification performance of anesthesia-induced transitions between wakefulness and deep sleep into eight classes by leveraging advanced graph neural network (GNN). The research combines seven datasets into a single dataset comprising 290 samples and investigates key brain regions, to develop a robust classification framework. Initially, the dataset is augmented using the Synthetic Minority Over-sampling Technique (SMOTE) to expand the sample size to 1197. A graph-based approach is employed to get the intricate relationships between features, constructing a graph dataset with 1197 nodes and 714,610 edges, where nodes represent data samples and edges are the connections between the nodes. The connection (edge weight) is calculated using Spearman correlation coefficient matrix. An optimized GNN model is developed through an ablation study of eight hyperparameters, achieving an accuracy of 92.8%. The model’s performance is further evaluated against one-dimensional (1D) CNN, and six machine learning models, demonstrating superior classification capabilities for small and imbalanced datasets. Additionally, we evaluated the proposed model on six different anesthesia datasets, observing no decline in performance. This work advances the understanding and classification of anesthesia states, providing a valuable tool for improved anesthesia management.

  4. Data from: Phenotype Driven Data Augmentation Methods for Transcriptomic...

    • zenodo.org
    zip
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Janakarajan; Nikita Janakarajan; Mara Graziani; Mara Graziani; María Rodríguez Martínez; María Rodríguez Martínez (2025). Phenotype Driven Data Augmentation Methods for Transcriptomic Data [Dataset]. http://doi.org/10.5281/zenodo.14983178
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nikita Janakarajan; Nikita Janakarajan; Mara Graziani; Mara Graziani; María Rodríguez Martínez; María Rodríguez Martínez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the data and associated results of all experiments conducted in our work "Phenotype Driven Data Augmentation Methods for Transcriptomic Data". In this work, we introduce two classes of phenotype driven data augmentation approaches – signature-dependent and signature-independent. The signature-dependent methods assume the existence of distinct gene signatures describing some phenotype and are simple, non-parametric, and novel data augmentation methods. The signature-independent methods are a modification of the established Gamma-Poisson and Poisson sampling methods for gene expression data. We benchmark our proposed methods against random oversampling, SMOTE, unmodified versions of Gamma-Poisson and Poisson sampling, and unaugmented data.

    This repository contains data used for all our experiments. This includes the original data based off which augmentation was performed, the cross validation split indices as a json file, the training and validation data augmented by the various augmentation methods mentioned in our study, a test set (containing only real samples) and an external test set standardised accordingly with respect to each augmentation method and training data per CV split.

    The compressed files 5x5stratified_{x}percent.zip contains data that were augmented on x% of the available real data. brca_public.zip contains data used for the breast cancer experiments. distribution_size_effect.zip contains data used for hyperparameter tuning the reference set size for the modified Poisson and Gamma-Poisson methods.

    The compressed file results.zip contains all the results from all the experiments. This includes the parameter files used to train the various models, the metrics (balanced accuracy and auc-roc) computed including p-values, as well as the latent space of train, validation and test (for the (N)VAE) for all 25 (5x5) CV splits.

    PLEASE NOTE: If any part of this repository is used in any form for your work, please attribute the following, in addition to attributing the original data source - TCGA, CPTAC, GSE20713 and METABRIC, accordingly:

    @article{janakarajan2025phenotype,
    title={Phenotype driven data augmentation methods for transcriptomic data},
    author={Janakarajan, Nikita and Graziani, Mara and Rodr{\'\i}guez Mart{\'\i}nez, Mar{\'\i}a},
    journal={Bioinformatics Advances},
    volume={5},
    number={1},
    pages={vbaf124},
    year={2025},
    publisher={Oxford University Press}
    }

  5. f

    Comparison table with 1D CNN models.

    • plos.figshare.com
    xls
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongge Niu; Renxin Ru; Jiasheng Zhang; Yibo Zhang; Cheng Ding; Yao Lan (2025). Comparison table with 1D CNN models. [Dataset]. http://doi.org/10.1371/journal.pone.0320299.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Dongge Niu; Renxin Ru; Jiasheng Zhang; Yibo Zhang; Cheng Ding; Yao Lan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Anesthesia plays a pivotal role in modern surgery by facilitating controlled states of unconsciousness. Precise control is crucial for safe and pain-free surgeries. Monitoring anesthesia depth accurately is essential to guide anesthesiologists, optimize drug usage, and mitigate postoperative complications. This study focuses on enhancing the classification performance of anesthesia-induced transitions between wakefulness and deep sleep into eight classes by leveraging advanced graph neural network (GNN). The research combines seven datasets into a single dataset comprising 290 samples and investigates key brain regions, to develop a robust classification framework. Initially, the dataset is augmented using the Synthetic Minority Over-sampling Technique (SMOTE) to expand the sample size to 1197. A graph-based approach is employed to get the intricate relationships between features, constructing a graph dataset with 1197 nodes and 714,610 edges, where nodes represent data samples and edges are the connections between the nodes. The connection (edge weight) is calculated using Spearman correlation coefficient matrix. An optimized GNN model is developed through an ablation study of eight hyperparameters, achieving an accuracy of 92.8%. The model’s performance is further evaluated against one-dimensional (1D) CNN, and six machine learning models, demonstrating superior classification capabilities for small and imbalanced datasets. Additionally, we evaluated the proposed model on six different anesthesia datasets, observing no decline in performance. This work advances the understanding and classification of anesthesia states, providing a valuable tool for improved anesthesia management.

  6. f

    Number of samples after dataset optimization.

    • figshare.com
    xls
    Updated May 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ming-zhou Lv; Kun-lun Li; Jia-zeng Cai; Jun Mao; Jia-jun Gao; Hui Xu (2025). Number of samples after dataset optimization. [Dataset]. http://doi.org/10.1371/journal.pone.0323487.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ming-zhou Lv; Kun-lun Li; Jia-zeng Cai; Jun Mao; Jia-jun Gao; Hui Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Landslides are frequent and hazardous geological disasters, posing significant risks to human safety and infrastructure. Accurate assessments of landslide susceptibility are crucial for risk management and mitigation. However, geological surveys of landslide areas are typically conducted at the township level, have lowsample sizes, and rely on experience. This study proposes a framework for assessing landslide susceptibility in Taiping Township, Zhejiang Province, China, using data balancing, machine learning, and data from 1,325 slope units with nine slope characteristics. The dataset was balanced using the Synthetic Minority Oversampling Technique and the Tomek link undersampling method (SMOTE-Tomek). A comparative analysis of six machine learning models was performed, and the SHapley Additive exPlanation (SHAP) method was used to assess the influencing factors. The results indicate that the machine learning algorithms provide high accuracy, and the random forest (RF) algorithm achieves the optimum model accuracy (0.791, F1 = 0.723). The very low, low, medium, and high sensitivity zones account for 92.27%, 5.12%, 1.78%, and 0.83% of the area, respectively. The height of cut slopes has the most significant impact on landslide sensitivity, whereas the altitude has a minor impact. The proposed model accurately assesses landslide susceptibility at the township scale, providing valuable insights for risk management and mitigation.

  7. f

    Data from: Using Fishery-related Data, Scientific Expertise and Machine...

    • figshare.com
    zip
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loukas Katikas; Sofia Reizopoulou; Paraskevi Drakopoulou; Celia Vassilopoulou (2025). Using Fishery-related Data, Scientific Expertise and Machine Learning to Improve Marine Habitat Mapping in Northeastern Mediterranean Waters [Dataset]. http://doi.org/10.6084/m9.figshare.28264625.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    figshare
    Authors
    Loukas Katikas; Sofia Reizopoulou; Paraskevi Drakopoulou; Celia Vassilopoulou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mediterranean Sea
    Description

    Marine habitat mapping is an essential tool for better planning of conservation efforts and the sustainable management of marine activities. High spatial resolution in marine habitat maps is of utmost importance as it may encompass more detail in the imagery and potentially reveal important biotopes. This level of detail supports directing monitoring and analysis efforts for the effective implementation of the EU environmental policies and providing more relevant advice for robust decision-making under both sectorial policies (e.g. the Common Fisheries Policy) and more integrated ones (e.g. the Marine Spatial Planning). In this study, sea bottom type data, recorded during the national monitoring of commercial fishing vessel operations and fishery surveys in the Greek Seas, were used. These data were then assigned to EU EMODnet seabed habitats using local ecological knowledge. Two Machine Learning (ML) algorithms were trained on the entire national-scale dataset and subsequently applied to assess their performance in predicting habitat types in the Saronikos Gulf (regional scale), using various environmental factors as predictors. These algorithms were the Random Forest Classifier (RFC), and the Gradient Boosting Classifier (GBC), while the Borderline Synthetic Minority Oversampling Technique (B-SMOTE) was applied for handling the inherent data class imbalances. A validation dataset and georeferenced data from previous studies were considered for comparing the models' accuracies and predictive performance. Through this approach, the Saronikos Gulf was enriched by five more habitat types than those visualized in the EMODnet portal, while also filling habitat gaps in areas where no data existed. Results from the application of the RFC-BS model (62% accuracy, 0.51 Kappa score) were then used to address conservation planning commitments recently made by the Greek government; the vast majority of marine seabed priority habitats in the study area seem to fall outside the borders of the current Natura 2000 sites, which served as the baseline for the declared trawl bans in Greek waters, following the provisions of the EU Marine Action Plan.

  8. f

    Landslide evaluation factors and value range.

    • plos.figshare.com
    xls
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ming-zhou Lv; Kun-lun Li; Jia-zeng Cai; Jun Mao; Jia-jun Gao; Hui Xu (2025). Landslide evaluation factors and value range. [Dataset]. http://doi.org/10.1371/journal.pone.0323487.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ming-zhou Lv; Kun-lun Li; Jia-zeng Cai; Jun Mao; Jia-jun Gao; Hui Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Landslides are frequent and hazardous geological disasters, posing significant risks to human safety and infrastructure. Accurate assessments of landslide susceptibility are crucial for risk management and mitigation. However, geological surveys of landslide areas are typically conducted at the township level, have lowsample sizes, and rely on experience. This study proposes a framework for assessing landslide susceptibility in Taiping Township, Zhejiang Province, China, using data balancing, machine learning, and data from 1,325 slope units with nine slope characteristics. The dataset was balanced using the Synthetic Minority Oversampling Technique and the Tomek link undersampling method (SMOTE-Tomek). A comparative analysis of six machine learning models was performed, and the SHapley Additive exPlanation (SHAP) method was used to assess the influencing factors. The results indicate that the machine learning algorithms provide high accuracy, and the random forest (RF) algorithm achieves the optimum model accuracy (0.791, F1 = 0.723). The very low, low, medium, and high sensitivity zones account for 92.27%, 5.12%, 1.78%, and 0.83% of the area, respectively. The height of cut slopes has the most significant impact on landslide sensitivity, whereas the altitude has a minor impact. The proposed model accurately assesses landslide susceptibility at the township scale, providing valuable insights for risk management and mitigation.

  9. f

    Results of Kruskal-Wallis test.

    • figshare.com
    xls
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ming-zhou Lv; Kun-lun Li; Jia-zeng Cai; Jun Mao; Jia-jun Gao; Hui Xu (2025). Results of Kruskal-Wallis test. [Dataset]. http://doi.org/10.1371/journal.pone.0323487.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ming-zhou Lv; Kun-lun Li; Jia-zeng Cai; Jun Mao; Jia-jun Gao; Hui Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Landslides are frequent and hazardous geological disasters, posing significant risks to human safety and infrastructure. Accurate assessments of landslide susceptibility are crucial for risk management and mitigation. However, geological surveys of landslide areas are typically conducted at the township level, have lowsample sizes, and rely on experience. This study proposes a framework for assessing landslide susceptibility in Taiping Township, Zhejiang Province, China, using data balancing, machine learning, and data from 1,325 slope units with nine slope characteristics. The dataset was balanced using the Synthetic Minority Oversampling Technique and the Tomek link undersampling method (SMOTE-Tomek). A comparative analysis of six machine learning models was performed, and the SHapley Additive exPlanation (SHAP) method was used to assess the influencing factors. The results indicate that the machine learning algorithms provide high accuracy, and the random forest (RF) algorithm achieves the optimum model accuracy (0.791, F1 = 0.723). The very low, low, medium, and high sensitivity zones account for 92.27%, 5.12%, 1.78%, and 0.83% of the area, respectively. The height of cut slopes has the most significant impact on landslide sensitivity, whereas the altitude has a minor impact. The proposed model accurately assesses landslide susceptibility at the township scale, providing valuable insights for risk management and mitigation.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wenzhu Song; Yanfeng Liu; Lixia Qiu; Jianbo Qing; Aizhong Li; Yan Zhao; Yafeng Li; Rongshan Li; Xiaoshuang Zhou (2023). Table_1_Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province.DOCX [Dataset]. http://doi.org/10.3389/fmed.2022.930541.s001

Table_1_Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province.DOCX

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jun 18, 2023
Dataset provided by
Frontiers
Authors
Wenzhu Song; Yanfeng Liu; Lixia Qiu; Jianbo Qing; Aizhong Li; Yan Zhao; Yafeng Li; Rongshan Li; Xiaoshuang Zhou
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionChronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China’s rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients’ quality of life.MethodsFrom April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively.Results12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD.ConclusionML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.

Search
Clear search
Close search
Google apps
Main menu