9 datasets found

Data from: S1 Dataset -
plos.figshare.com
application/x-rar
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0286791.s001
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286791.s001
Dataset updated
Nov 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.
Phase 2 performance measures for Dataset 2.
plos.figshare.com
xls
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Phase 2 performance measures for Dataset 2. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286791.t008
Dataset updated
Nov 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.
f
AUCa calculated by testing prediction models on CMU-SOb.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jul 25, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue, Zhen-yu; Xu, Ying-ying; Tong, Lin-lin; Zhou, Xin; Xu, Hui-mian; Song, Yong-xi; Wang, Zhen-ning; Gao, Peng (2012). AUCa calculated by testing prediction models on CMU-SOb. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001154821
Explore at:
Dataset updated
Jul 25, 2012
Authors
Yue, Zhen-yu; Xu, Ying-ying; Tong, Lin-lin; Zhou, Xin; Xu, Hui-mian; Song, Yong-xi; Wang, Zhen-ning; Gao, Peng
Description
AUCa: area under the receiver operating characteristic curves.CMU-SOb: A dataset collects clinical information from Department of Surgical Oncology at the First Hospital of China Medical University.variable selectionc: the variable selection method which has the highest AUC.Globald: without variable selection.GAe: variable selection using genetic algorithms.BSFSf: variable selection using backward stepwise feature selection.*: median AUC of 15 tests.**: comparing the AUC of prediction models with TNM staging system.
Comparison of the proposed model, with others in the literature.
plos.figshare.com
xls
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Comparison of the proposed model, with others in the literature. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286791.t010
Dataset updated
Nov 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the proposed model, with others in the literature.
Gene expression datasets used in the investigations.
plos.figshare.com
xls
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Gene expression datasets used in the investigations. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286791.t002
Dataset updated
Nov 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Gene expression datasets used in the investigations.
f
Table_1_Integrated Evolutionary Learning: An Artificial Intelligence...
frontiersin.figshare.com
docx
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nina de Lacy; Michael J. Ramshaw; J. Nathan Kutz (2023). Table_1_Integrated Evolutionary Learning: An Artificial Intelligence Approach to Joint Learning of Features and Hyperparameters for Optimized, Explainable Machine Learning.DOCX [Dataset]. http://doi.org/10.3389/frai.2022.832530.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2022.832530.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Nina de Lacy; Michael J. Ramshaw; J. Nathan Kutz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial intelligence and machine learning techniques have proved fertile methods for attacking difficult problems in medicine and public health. These techniques have garnered strong interest for the analysis of the large, multi-domain open science datasets that are increasingly available in health research. Discovery science in large datasets is challenging given the unconstrained nature of the learning environment where there may be a large number of potential predictors and appropriate ranges for model hyperparameters are unknown. As well, it is likely that explainability is at a premium in order to engage in future hypothesis generation or analysis. Here, we present a novel method that addresses these challenges by exploiting evolutionary algorithms to optimize machine learning discovery science while exploring a large solution space and minimizing bias. We demonstrate that our approach, called integrated evolutionary learning (IEL), provides an automated, adaptive method for jointly learning features and hyperparameters while furnishing explainable models where the original features used to make predictions may be obtained even with artificial neural networks. In IEL the machine learning algorithm of choice is nested inside an evolutionary algorithm which selects features and hyperparameters over generations on the basis of an information function to converge on an optimal solution. We apply IEL to three gold standard machine learning algorithms in challenging, heterogenous biobehavioral data: deep learning with artificial neural networks, decision tree-based techniques and baseline linear models. Using our novel IEL approach, artificial neural networks achieved ≥ 95% accuracy, sensitivity and specificity and 45–73% R2 in classification and substantial gains over default settings. IEL may be applied to a wide range of less- or unconstrained discovery science problems where the practitioner wishes to jointly learn features and hyperparameters in an adaptive, principled manner within the same algorithmic process. This approach offers significant flexibility, enlarges the solution space and mitigates bias that may arise from manual or semi-manual hyperparameter tuning and feature selection and presents the opportunity to select the inner machine learning algorithm based on the results of optimized learning for the problem at hand.
Comparison of the different parameters in previous work.
plos.figshare.com
xls
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). Comparison of the different parameters in previous work. [Dataset]. http://doi.org/10.1371/journal.pone.0286791.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286791.t001
Dataset updated
Nov 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the different parameters in previous work.
Table_1_Deep Feature Selection and Causal Analysis of Alzheimer’s...
frontiersin.figshare.com
xlsx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanyuan Liu; Zhouxuan Li; Qiyang Ge; Nan Lin; Momiao Xiong (2023). Table_1_Deep Feature Selection and Causal Analysis of Alzheimer’s Disease.XLSX [Dataset]. http://doi.org/10.3389/fnins.2019.01198.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnins.2019.01198.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Yuanyuan Liu; Zhouxuan Li; Qiyang Ge; Nan Lin; Momiao Xiong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep convolutional neural networks (DCNNs) have achieved great success for image classification in medical research. Deep learning with brain imaging is the imaging method of choice for the diagnosis and prediction of Alzheimer’s disease (AD). However, it is also well known that DCNNs are “black boxes” owing to their low interpretability to humans. The lack of transparency of deep learning compromises its application to the prediction and mechanism investigation in AD. To overcome this limitation, we develop a novel general framework that integrates deep leaning, feature selection, causal inference, and genetic-imaging data analysis for predicting and understanding AD. The proposed algorithm not only improves the prediction accuracy but also identifies the brain regions underlying the development of AD and causal paths from genetic variants to AD via image mediation. The proposed algorithm is applied to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with diffusion tensor imaging (DTI) in 151 subjects (51 AD and 100 non-AD) who were measured at four time points of baseline, 6 months, 12 months, and 24 months. The algorithm identified brain regions underlying AD consisting of the temporal lobes (including the hippocampus) and the ventricular system.
Feature selection methodology (in descending order of correlation scores).
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Li; Amber Mueller; Brad English; Anthony Arena; Daniel Vera; Alice E. Kane; David A. Sinclair (2023). Feature selection methodology (in descending order of correlation scores). [Dataset]. http://doi.org/10.1371/journal.pcbi.1009938.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1009938.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Adam Li; Amber Mueller; Brad English; Anthony Arena; Daniel Vera; Alice E. Kane; David A. Sinclair
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of features selected by each method parenthesized in the first column.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0286791.s001

Data from: S1 Dataset -

Explore at:

application/x-rarAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0286791.s001

Dataset updated

Nov 2, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Murad Al-Rajab; Joan Lu; Qiang Xu; Mohamed Kentour; Ahlam Sawsa; Emad Shuweikeh; Mike Joy; Ramesh Arasaradnam

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.

Clear search

Close search

Google apps

Main menu

Data from: S1 Dataset -

Phase 2 performance measures for Dataset 2.

AUCa calculated by testing prediction models on CMU-SOb.

Comparison of the proposed model, with others in the literature.

Gene expression datasets used in the investigations.

Table_1_Integrated Evolutionary Learning: An Artificial Intelligence...

Comparison of the different parameters in previous work.

Table_1_Deep Feature Selection and Causal Analysis of Alzheimer’s...

Feature selection methodology (in descending order of correlation scores).

Data from: S1 Dataset -