10 datasets found
  1. f

    Results of the ML models using KNN imputer.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aljrees, Turki (2024). Results of the ML models using KNN imputer. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001270765
    Explore at:
    Dataset updated
    Jan 3, 2024
    Authors
    Aljrees, Turki
    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  2. f

    Description of the dataset used in this study.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Description of the dataset used in this study. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  3. ICR - Identifying Age Related Conditions-Filtered

    • kaggle.com
    zip
    Updated May 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Onkur7 (2023). ICR - Identifying Age Related Conditions-Filtered [Dataset]. https://www.kaggle.com/datasets/onkur7/icr-identifying-age-related-conditions-filtered
    Explore at:
    zip(1372977 bytes)Available download formats
    Dataset updated
    May 22, 2023
    Authors
    Onkur7
    Description

    The dataset is created by imputing the missing values of ICR - Identifying Age Related Conditions competition dataset. In this dataset depending on feature selection some subversions are also created. - Version 1 : The version is created by dropping all the rows with missing values. - Version 2 : The version is created by 'BQ' and 'EL' columns which consist most of the missing values. To remove the remaining missing values rows with missing values are deleted. - Version 3 : The version is created by imputing mean values by column average. Median is considered as measure of average. - Version 4 : The version is created by imputing missing values of 'BQ' and 'EL' by linear regression models and remaining missing values are imputed by average value of the column where missing value is present. 'AB', 'AF', 'AH', 'AM', 'CD', 'CF', 'DN', 'FL' and 'GL' are used to calculate the missing values of 'BQ'. 'CU', 'GE' and 'GL' are used to calculate missing values of 'EL'. Models are found in the version4/imputer. Two subversions are created by extraction only important features of the dataset. - Version 5 : The version is created by imputing missing values using KNNImputer. Two subversions are created by extracting only important features. For the categorical feature 'EJ', 'A' is encoded as 0 and 'B' is encoded as '1'. For more details how the transformations of the dataset is done visit this notebook.

  4. CIBMRT-cleaned

    • kaggle.com
    zip
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Lasota (2025). CIBMRT-cleaned [Dataset]. https://www.kaggle.com/datasets/tomaszlasota/cibmrt-cleaned
    Explore at:
    zip(1310899 bytes)Available download formats
    Dataset updated
    Jan 1, 2025
    Authors
    Tomasz Lasota
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📊 Data Overview The original dataset can be found in the CIBMTR competition on Kaggle. Users are encouraged to refer to the original source for context and additional documentation. Primary Focus: The preprocessing focused exclusively on handling missing values without altering feature distributions or introducing additional transformations.

    🛠️ Data Cleaning Process Numerical Features:

    Missing values in numerical columns were imputed using the KNNImputer after applying MinMaxScaling to ensure consistent imputation across varying ranges.

    Categorical Features:

    Missing values in categorical columns were addressed using the SimpleImputer with the 'most_frequent' strategy to maintain logical consistency. Special Handling for Key Features:

    For the categorical features tce_imm_match, tce_div_match, and tce_match, a custom value mapping approach was applied using the following mappings:

    Mapping A:

    'P/P' → 'Permissive mismatched' 'G/G' → 'GvH non-permissive' 'H/H' → 'HvG non-permissive'

    Mapping B:

    'Permissive mismatched' → 'Permissive' 'GvH non-permissive' → 'GvH non-permissive' 'HvG non-permissive' → 'HvG non-permissive'

    🔍 Key Notes Imputation Consistency: Numerical and categorical missing values were handled using appropriate techniques to reduce data leakage risks. Scalability: Numerical scaling was performed prior to KNN imputation and can be inverted using the respective scaler for downstream tasks. Traceability: Original mappings and imputation strategies are documented to maintain transparency in data processing steps.

    📥 Recommendation For additional insights, refer to the original CIBMTR dataset from the competition. This preprocessed dataset serves as an optimized starting point for predictive modeling and exploratory data analysis.

  5. Accuracy comparison of the ML models.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Accuracy comparison of the ML models. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  6. Experimental setup for the proposed system.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Experimental setup for the proposed system. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  7. Machine learning models.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Machine learning models. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

  8. Results of the ML models were obtained by deleting missing values from the...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Results of the ML models were obtained by deleting missing values from the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of the ML models were obtained by deleting missing values from the dataset.

  9. 5-fold- cross-validation results for the proposed approach.

    • plos.figshare.com
    xls
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). 5-fold- cross-validation results for the proposed approach. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    5-fold- cross-validation results for the proposed approach.

  10. Performance comparison with state-of-the-art studies.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turki Aljrees (2024). Performance comparison with state-of-the-art studies. [Dataset]. http://doi.org/10.1371/journal.pone.0295632.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Turki Aljrees
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison with state-of-the-art studies.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aljrees, Turki (2024). Results of the ML models using KNN imputer. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001270765

Results of the ML models using KNN imputer.

Explore at:
Dataset updated
Jan 3, 2024
Authors
Aljrees, Turki
Description

Cervical cancer is a leading cause of women’s mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification—handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women’s health and healthcare systems.

Search
Clear search
Close search
Google apps
Main menu