12 datasets found
  1. s

    Online Feature Selection and Its Applications

    • researchdata.smu.edu.sg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
    Explore at:
    Dataset updated
    May 31, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/

  2. f

    Long Covid Risk

    • figshare.com
    txt
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shaheen (2024). Long Covid Risk [Dataset]. http://doi.org/10.6084/m9.figshare.25599591.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 13, 2024
    Dataset provided by
    figshare
    Authors
    Ahmed Shaheen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Feature preparation Preprocessing was applied to the data, such as creating dummy variables and performing transformations (centering, scaling, YeoJohnson) using the preProcess() function from the “caret” package in R. The correlation among the variables was examined and no serious multicollinearity problems were found. A stepwise variable selection was performed using a logistic regression model. The final set of variables included: Demographic: age, body mass index, sex, ethnicity, smoking History of disease: heart disease, migraine, insomnia, gastrointestinal disease, COVID-19 history: covid vaccination, rashes, conjunctivitis, shortness of breath, chest pain, cough, runny nose, dysgeusia, muscle and joint pain, fatigue, fever ,COVID-19 reinfection, and ICU admission. These variables were used to train and test various machine-learning models Model selection and training The data was randomly split into 80% training and 20% testing subsets. The “h2o” package in R version 4.3.1 was employed to implement different algorithms. AutoML was first used, which automatically explored a range of models with different configurations. Gradient Boosting Machines (GBM), Random Forest (RF), and Regularized Generalized Linear Model (GLM) were identified as the best-performing models on our data and their parameters were fine-tuned. An ensemble method that stacked different models together was also used, as it could sometimes improve the accuracy. The models were evaluated using the area under the curve (AUC) and C-statistics as diagnostic measures. The model with the highest AUC was selected for further analysis using the confusion matrix, accuracy, sensitivity, specificity, and F1 and F2 scores. The optimal prediction threshold was determined by plotting the sensitivity, specificity, and accuracy and choosing the point of intersection as it balanced the trade-off between the three metrics. The model’s predictions were also plotted, and the quantile ranges were used to classify the model’s prediction as follows: > 1st quantile, > 2nd quantile, > 3rd quartile and < 3rd quartile (very low, low, moderate, high) respectively. Metric Formula C-statistics (TPR + TNR - 1) / 2 Sensitivity/Recall TP / (TP + FN) Specificity TN / (TN + FP) Accuracy (TP + TN) / (TP + TN + FP + FN) F1 score 2 * (precision * recall) / (precision + recall) Model interpretation We used the variable importance plot, which is a measure of how much each variable contributes to the prediction power of a machine learning model. In H2O package, variable importance for GBM and RF is calculated by measuring the decrease in the model's error when a variable is split on. The more a variable's split decreases the error, the more important that variable is considered to be. The error is calculated using the following formula: 𝑆𝐸=𝑀𝑆𝐸∗𝑁=𝑉𝐴𝑅∗𝑁 and then it is scaled between 0 and 1 and plotted. Also, we used The SHAP summary plot which is a graphical tool to visualize the impact of input features on the prediction of a machine learning model. SHAP stands for SHapley Additive exPlanations, a method to calculate the contribution of each feature to the prediction by averaging over all possible subsets of features [28]. SHAP summary plot shows the distribution of the SHAP values for each feature across the data instances. We use the h2o.shap_summary_plot() function in R to generate the SHAP summary plot for our GBM model. We pass the model object and the test data as arguments, and optionally specify the columns (features) we want to include in the plot. The plot shows the SHAP values for each feature on the x-axis, and the features on the y-axis. The color indicates whether the feature value is low (blue) or high (red). The plot also shows the distribution of the feature values as a density plot on the right.

  3. Reproducibility Package for "On the Anatomy of Real-World R Code for Static...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Sihler; Florian Sihler (2024). Reproducibility Package for "On the Anatomy of Real-World R Code for Static Analysis" [Dataset]. http://doi.org/10.5281/zenodo.10569379
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Florian Sihler; Florian Sihler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 24, 2024
    Description

    This is the reproducibility package for the paper "On the Anatomy of Real-World R Code for Static Analysis", accepted at MSR 2024.

  4. f

    Data from: Accelerated Design of Flame Retardant Polymeric Nanocomposites...

    • acs.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhuoran Zhang; Zeren Jiao; Ruiqing Shen; Pingan Song; Qingsheng Wang (2023). Accelerated Design of Flame Retardant Polymeric Nanocomposites via Machine Learning Prediction [Dataset]. http://doi.org/10.1021/acsaenm.2c00145.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    ACS Publications
    Authors
    Zhuoran Zhang; Zeren Jiao; Ruiqing Shen; Pingan Song; Qingsheng Wang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Improving the flame retardancy of polymeric materials used in engineering applications is an increasingly important strategy for limiting fire hazards. However, the wide variety of flame retardant polymeric nanocomposite compositions prevents quick identification of the optimal design for a specific application. In this study, we built a flame retardancy database of more than 800 polymeric nanocomposites, including information from polymer flammability, thermal stability, and nanofiller properties. Then, we applied five machine learning algorithms to predict the flame retardancy index for different types of flame retardant polymeric nanocomposites. Among them, extreme gradient boosting regression gives the best prediction with a coefficient of determination (R2) of 0.94 and a root-mean-square error of 0.17. In addition, we studied how the physical features of polymeric nanocomposites affected flame retardancy using the correlation matrix and feature importance plot, which in turn was used to guide the design of polymeric nanocomposites for flame retardant applications. Following the guidelines, a high-performance flame retardant polymeric nanocomposite was designed and synthesized, and the experimental FRI result was compared with the machine learning prediction (6% prediction error). This result demonstrated a fast identification of flame retardancy of polymeric nanocomposite without large-scale fire tests, which could accelerate the design of functional polymeric nanocomposites in the flame retardant field.

  5. f

    Data from: In Silico Study of Metal–Organic Frameworks for CO2/CO...

    • figshare.com
    xlsx
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    I-Ting Sung; Li-Chiang Lin (2023). In Silico Study of Metal–Organic Frameworks for CO2/CO Separation: Molecular Simulations and Machine Learning [Dataset]. http://doi.org/10.1021/acs.jpcc.3c02452.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    ACS Publications
    Authors
    I-Ting Sung; Li-Chiang Lin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Metal–organic frameworks (MOFs), an emerging class of nanoporous materials, have drawn considerable attention as promising adsorbents for gas separations. Among various separation applications, CO2/CO separation is of particular interest owing to its industrial relevance. While searching for promising MOFs from tens of thousands of candidates represents a great challenge, this study conducts large-scale molecular simulations to identify top-performing CO2 adsorbents, followed by investigating structure–property relationships for their design. Optimal MOFs are found to possess features such as metal nodes of greater metallic charges and dipole moments with a relatively confined pore structure. With the large-scale data at our disposal, machine learning models capable of predicting the CO2-to-CO selectivity and adsorption uptakes are also established. Specifically, three algorithms including support vector regression (SVR), extreme gradient boosting (XGBoost), and random forest (RF) models are employed. The results show that the RF algorithm demonstrates the best accuracy, and the r value for the predicted CO2-to-CO selectivity (S) can be as large as ∼0.88. The relative importance of the adopted features is also investigated with results suggesting that the adsorption of CO2 initiates more preferentially than that of CO due to the stronger van der Waals interaction and electrostatic contribution between CO2 and the metal sites. Finally, a design rule is proposed for the optimal design of CO2-selective materials. Overall, this work demonstrates a successful hybrid approach combining molecular simulations and machine learning for screening highly CO2/CO selective MOFs and offering insights into the design of optimal adsorbents.

  6. f

    Parameter setting table for training feature extraction network.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junpeng Wu; Shaobo Tang; Xianglei Li; Yibo Zhou (2023). Parameter setting table for training feature extraction network. [Dataset]. http://doi.org/10.1371/journal.pone.0266444.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Junpeng Wu; Shaobo Tang; Xianglei Li; Yibo Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameter setting table for training feature extraction network.

  7. f

    Data from: Enhancing the Predictive Performance of Molecularly Imprinted...

    • acs.figshare.com
    zip
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reza Mohammadi Dashtaki; Saeed Mohammadi Dashtaki; Esmaeil Heydari-Bafrooei; Md Jalil Piran (2025). Enhancing the Predictive Performance of Molecularly Imprinted Polymer-Based Electrochemical Sensors Using a Stacking Regressor Ensemble of Machine Learning Models [Dataset]. http://doi.org/10.1021/acssensors.5c00364.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    ACS Publications
    Authors
    Reza Mohammadi Dashtaki; Saeed Mohammadi Dashtaki; Esmaeil Heydari-Bafrooei; Md Jalil Piran
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The performance of electrochemical sensors is influenced by various factors. To enhance the effectiveness of these sensors, it is crucial to find the right balance among these factors. Researchers and engineers continually explore innovative approaches to enhance sensitivity, selectivity, and reliability. Machine learning (ML) techniques facilitate the analysis and predictive modeling of sensor performance by establishing quantitative relationships between parameters and their effects. This work presents a case study on developing a molecularly imprinted polymer (MIP)-based sensor for detecting doxorubicin (Dox), emphasizing the use of ML-based ensemble models to improve performance and reliability. Four ML models, including Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and K-Nearest Neighbors (KNN), are used to evaluate the effect of each parameter on prediction performance, using the SHapley Additive exPlanations (SHAP) method to determine feature importance. Based on the analysis, removing a less influential feature and introducing a new feature significantly improved the model’s predictive capabilities. By applying the min–max scaling technique, it is ensured that all features contribute proportionally to the model learning process. Additionally, multiple ML modelsLinear Regression (LR), KNN, DT, RF, Adaptive Boosting (AdaBoost), Gradient Boosting (GB), Support Vector Regression (SVR), XGBoost, Bagging, Partial Least Squares (PLS), and Ridge Regressionare applied to the data set and their performance in predicting the sensor output current is compared. To further enhance prediction performance, a novel ensemble model is proposed that integrates DT, RF, GB, XGBoost, and Bagging regressors, leveraging their combined strengths to offset individual weaknesses. The main benefit of this work lies in its ability to enhance MIP-based sensor performance by developing a novel stacking regressor ensemble model, which improves prediction performance and reliability. This methodology is broadly applicable to the development of other sensors with different transducers and sensing elements. Through extensive simulation results, the proposed stacking regressor ensemble model demonstrated superior predictive performance compared to individual ML models. The model achieved an R-squared (R2) of 0.993, significantly reducing the root-mean-square error (RMSE) to 0.436 and the mean absolute error (MAE) to 0.244. These improvements enhanced sensitivity and reliability of the MIP-based electrochemical sensor, demonstrating a substantial performance gain over individual ML models.

  8. f

    Data from: Prediction of Chemical Looping Hydrogen Production Using...

    • acs.figshare.com
    xlsx
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jialei Cao; Liyan Sun; Fan Yin; Ran Zhang; Zixiang Gao; Rui Xiao (2024). Prediction of Chemical Looping Hydrogen Production Using Physics-Informed Machine Learning [Dataset]. http://doi.org/10.1021/acs.energyfuels.4c02988.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 2, 2024
    Dataset provided by
    ACS Publications
    Authors
    Jialei Cao; Liyan Sun; Fan Yin; Ran Zhang; Zixiang Gao; Rui Xiao
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Hydrogen energy holds promise for controlling emissions but is limited by the production cost and method. Chemical looping hydrogen production (CLHP) provides a more efficient and environmentally sustainable route to produce high-purity hydrogen compared with conventional methods. Yet, CLHP involves a series of operational variables, and the optimization of operating conditions is the critical issue for large-scale hydrogen production. In this study, support vector machine (SVM), decision tree (DT), random forest (RF), artificial neural network (ANN), and physics-informed neural network (PINN) models are developed to predict hydrogen production rates by analyzing multiple process variables. Through the analysis of the database and experiments, we integrated physical consistency as prior physical knowledge into the PINN for eliminating the data dependence. All models are optimized for optimal performance through hyperparameters. The comparison of five machine learning models reveals that DT and RF models exhibit a characteristic step-like pattern in their predictions, while SVM and ANN models produce outputs that often diverge from the expected trend. The prediction of the PINN model exhibits good performance with R2, mean squared error, and mean absolute percentage error scores of 0.882, 1.228, and 18.1%, respectively. The results are with high interpretability due to the physical-informed inherent feature. Then, the CLHP process is studied, and the relationships between hydrogen yield and operating temperature, gas flow rate, and mass fraction of iron oxide are established. This work shows the difference in the prediction curves between the different models. By training various general models and comparing their predictive performance on chemical looping data, we can gain valuable insights to guide subsequent predictions for CLHP. It will be beneficial for the design of oxygen carriers and the optimization of the CLHP process.

  9. f

    Model Performance Metrics After Feature Fusion and Normalization.

    • plos.figshare.com
    xls
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sana Yaqoob; Ayman Noor; Talal H. Noor; Mohammad Zubair Khan; Anmol Ejaz; Md Imran Alam; Nadim Rana; Khurram Ejaz (2025). Model Performance Metrics After Feature Fusion and Normalization. [Dataset]. http://doi.org/10.1371/journal.pone.0321108.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Sana Yaqoob; Ayman Noor; Talal H. Noor; Mohammad Zubair Khan; Anmol Ejaz; Md Imran Alam; Nadim Rana; Khurram Ejaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model Performance Metrics After Feature Fusion and Normalization.

  10. f

    mAP comparison of three kinds of industrial equipment detection by ROMS...

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junpeng Wu; Shaobo Tang; Xianglei Li; Yibo Zhou (2023). mAP comparison of three kinds of industrial equipment detection by ROMS R-CNN algorithm with different structures and functions. [Dataset]. http://doi.org/10.1371/journal.pone.0266444.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Junpeng Wu; Shaobo Tang; Xianglei Li; Yibo Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    mAP comparison of three kinds of industrial equipment detection by ROMS R-CNN algorithm with different structures and functions.

  11. f

    Comparison of parameters and complexity of different network structures in...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junpeng Wu; Shaobo Tang; Xianglei Li; Yibo Zhou (2023). Comparison of parameters and complexity of different network structures in feature extraction. [Dataset]. http://doi.org/10.1371/journal.pone.0266444.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Junpeng Wu; Shaobo Tang; Xianglei Li; Yibo Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of parameters and complexity of different network structures in feature extraction.

  12. f

    Performance Evaluation of Feature Fusion and normalization, AFR, and Hybrid...

    • plos.figshare.com
    xls
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sana Yaqoob; Ayman Noor; Talal H. Noor; Mohammad Zubair Khan; Anmol Ejaz; Md Imran Alam; Nadim Rana; Khurram Ejaz (2025). Performance Evaluation of Feature Fusion and normalization, AFR, and Hybrid Methods. [Dataset]. http://doi.org/10.1371/journal.pone.0321108.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 9, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Sana Yaqoob; Ayman Noor; Talal H. Noor; Mohammad Zubair Khan; Anmol Ejaz; Md Imran Alam; Nadim Rana; Khurram Ejaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance Evaluation of Feature Fusion and normalization, AFR, and Hybrid Methods.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1

Online Feature Selection and Its Applications

Related Article
Explore at:
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/

Search
Clear search
Close search
Google apps
Main menu