9 datasets found
  1. Data from: S1 Dataset -

    • plos.figshare.com
    application/x-rar
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0286791.s001
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.

  2. Phase 2 performance measures for Dataset 2.

    • plos.figshare.com
    xls
    Updated Nov 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Phase 2 performance measures for Dataset 2. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.

  3. f

    AUCa calculated by testing prediction models on CMU-SOb.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jul 25, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue, Zhen-yu; Xu, Ying-ying; Tong, Lin-lin; Zhou, Xin; Xu, Hui-mian; Song, Yong-xi; Wang, Zhen-ning; Gao, Peng (2012). AUCa calculated by testing prediction models on CMU-SOb. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001154821
    Explore at:
    Dataset updated
    Jul 25, 2012
    Authors
    Yue, Zhen-yu; Xu, Ying-ying; Tong, Lin-lin; Zhou, Xin; Xu, Hui-mian; Song, Yong-xi; Wang, Zhen-ning; Gao, Peng
    Description

    AUCa: area under the receiver operating characteristic curves.CMU-SOb: A dataset collects clinical information from Department of Surgical Oncology at the First Hospital of China Medical University.variable selectionc: the variable selection method which has the highest AUC.Globald: without variable selection.GAe: variable selection using genetic algorithms.BSFSf: variable selection using backward stepwise feature selection.*: median AUC of 15 tests.**: comparing the AUC of prediction models with TNM staging system.

  4. Comparison of the proposed model, with others in the literature.

    • plos.figshare.com
    xls
    Updated Nov 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Comparison of the proposed model, with others in the literature. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the proposed model, with others in the literature.

  5. Gene expression datasets used in the investigations.

    • plos.figshare.com
    xls
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Gene expression datasets used in the investigations. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene expression datasets used in the investigations.

  6. f

    Table_1_Integrated Evolutionary Learning: An Artificial Intelligence...

    • frontiersin.figshare.com
    docx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nina de Lacy; Michael J. Ramshaw; J. Nathan Kutz (2023). Table_1_Integrated Evolutionary Learning: An Artificial Intelligence Approach to Joint Learning of Features and Hyperparameters for Optimized, Explainable Machine Learning.DOCX [Dataset]. http://doi.org/10.3389/frai.2022.832530.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Nina de Lacy; Michael J. Ramshaw; J. Nathan Kutz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial intelligence and machine learning techniques have proved fertile methods for attacking difficult problems in medicine and public health. These techniques have garnered strong interest for the analysis of the large, multi-domain open science datasets that are increasingly available in health research. Discovery science in large datasets is challenging given the unconstrained nature of the learning environment where there may be a large number of potential predictors and appropriate ranges for model hyperparameters are unknown. As well, it is likely that explainability is at a premium in order to engage in future hypothesis generation or analysis. Here, we present a novel method that addresses these challenges by exploiting evolutionary algorithms to optimize machine learning discovery science while exploring a large solution space and minimizing bias. We demonstrate that our approach, called integrated evolutionary learning (IEL), provides an automated, adaptive method for jointly learning features and hyperparameters while furnishing explainable models where the original features used to make predictions may be obtained even with artificial neural networks. In IEL the machine learning algorithm of choice is nested inside an evolutionary algorithm which selects features and hyperparameters over generations on the basis of an information function to converge on an optimal solution. We apply IEL to three gold standard machine learning algorithms in challenging, heterogenous biobehavioral data: deep learning with artificial neural networks, decision tree-based techniques and baseline linear models. Using our novel IEL approach, artificial neural networks achieved ≥ 95% accuracy, sensitivity and specificity and 45–73% R2 in classification and substantial gains over default settings. IEL may be applied to a wide range of less- or unconstrained discovery science problems where the practitioner wishes to jointly learn features and hyperparameters in an adaptive, principled manner within the same algorithmic process. This approach offers significant flexibility, enlarges the solution space and mitigates bias that may arise from manual or semi-manual hyperparameter tuning and feature selection and presents the opportunity to select the inner machine learning algorithm based on the results of optimized learning for the problem at hand.

  7. Comparison of the different parameters in previous work.

    • plos.figshare.com
    xls
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Comparison of the different parameters in previous work. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the different parameters in previous work.

  8. Table_1_Deep Feature Selection and Causal Analysis of Alzheimer’s...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanyuan Liu; Zhouxuan Li; Qiyang Ge; Nan Lin; Momiao Xiong (2023). Table_1_Deep Feature Selection and Causal Analysis of Alzheimer’s Disease.XLSX [Dataset]. http://doi.org/10.3389/fnins.2019.01198.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Yuanyuan Liu; Zhouxuan Li; Qiyang Ge; Nan Lin; Momiao Xiong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Deep convolutional neural networks (DCNNs) have achieved great success for image classification in medical research. Deep learning with brain imaging is the imaging method of choice for the diagnosis and prediction of Alzheimer’s disease (AD). However, it is also well known that DCNNs are “black boxes” owing to their low interpretability to humans. The lack of transparency of deep learning compromises its application to the prediction and mechanism investigation in AD. To overcome this limitation, we develop a novel general framework that integrates deep leaning, feature selection, causal inference, and genetic-imaging data analysis for predicting and understanding AD. The proposed algorithm not only improves the prediction accuracy but also identifies the brain regions underlying the development of AD and causal paths from genetic variants to AD via image mediation. The proposed algorithm is applied to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with diffusion tensor imaging (DTI) in 151 subjects (51 AD and 100 non-AD) who were measured at four time points of baseline, 6 months, 12 months, and 24 months. The algorithm identified brain regions underlying AD consisting of the temporal lobes (including the hippocampus) and the ventricular system.

  9. Feature selection methodology (in descending order of correlation scores).

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Li; Amber Mueller; Brad English; Anthony Arena; Daniel Vera; Alice E. Kane; David A. Sinclair (2023). Feature selection methodology (in descending order of correlation scores). [Dataset]. http://doi.org/10.1371/journal.pcbi.1009938.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Adam Li; Amber Mueller; Brad English; Anthony Arena; Daniel Vera; Alice E. Kane; David A. Sinclair
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of features selected by each method parenthesized in the first column.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0286791.s001
Organization logo

Data from: S1 Dataset -

Related Article
Explore at:
application/x-rarAvailable download formats
Dataset updated
Nov 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.

Search
Clear search
Close search
Google apps
Main menu