100+ datasets found
  1. Student Performance Prediction

    • kaggle.com
    zip
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amr Maree (2025). Student Performance Prediction [Dataset]. https://www.kaggle.com/datasets/amrmaree/student-performance-prediction
    Explore at:
    zip(10981 bytes)Available download formats
    Dataset updated
    Mar 3, 2025
    Authors
    Amr Maree
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Student Performance Prediction Dataset 🎓📊

    Overview

    This dataset contains information about students' academic performance, study habits, and external factors affecting their final exam scores. It is designed for predictive modeling, data visualization, and educational analytics.

    Dataset Purpose

    This dataset is useful for:
    - Predicting student final exam scores 📈
    - Identifying key factors that impact academic performance 🎯
    - Exploring feature importance in education-related datasets 📊
    - Building machine learning models for regression and classification 🤖

    Columns Description

    Column NameDescription
    Student_IDUnique identifier for each student.
    GenderGender of the student (Male/Female).
    Study_Hours_per_WeekAverage number of study hours per week.
    Attendance_RateAttendance percentage (50% - 100%).
    Past_Exam_ScoresAverage score of previous exams (50 - 100).
    Parental_Education_LevelEducation level of parents (High School, Bachelors, Masters, PhD).
    Internet_Access_at_HomeWhether the student has internet access at home (Yes/No).
    Extracurricular_ActivitiesWhether the student participates in extracurricular activities (Yes/No).
    Final_Exam_Score (Target)The final exam score of the student (50 - 100, integer values).
    Pass_Fail (Target)The student status (Pass/Fail).

    Ideas for Notebooks 📑

    1. Regression Analysis – Predict final exam scores using machine learning models (Linear Regression, Random Forest, XGBoost).
    2. Feature Importance – Analyze which factors contribute the most to student performance.
    3. Exploratory Data Analysis (EDA) – Visualize the impact of study hours, attendance, and other features.
    4. Classification – Convert scores into categories (Pass/Fail, A/B/C/D) and build classification models.

    License & Usage

    This dataset is open for public use. Feel free to use it for learning, research, and model-building! 🚀

  2. Student Performance Prediction

    • kaggle.com
    zip
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Souradip Pal (2024). Student Performance Prediction [Dataset]. https://www.kaggle.com/datasets/souradippal/student-performance-prediction
    Explore at:
    zip(389868 bytes)Available download formats
    Dataset updated
    Aug 16, 2024
    Authors
    Souradip Pal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is designed for practicing classification tasks, specifically predicting whether a student will pass or fail a course based on various academic and demographic factors. It contains 40,000 records of students, with attributes including study habits, attendance rates, previous grades, and more. The dataset also introduces challenges such as missing values, incorrect data, and noise, making it ideal for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

  3. f

    Performance of models using CNN features.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umer, Muhammad; Mohamed, Abdullah; Abuzinadah, Nihal; Ishaq, Abid; Eshmawi, Ala’ Abdulmajid; Alsubai, Shtwai; Ashraf, Imran; Al Hejaili, Abdullah (2023). Performance of models using CNN features. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000971153
    Explore at:
    Dataset updated
    Nov 8, 2023
    Authors
    Umer, Muhammad; Mohamed, Abdullah; Abuzinadah, Nihal; Ishaq, Abid; Eshmawi, Ala’ Abdulmajid; Alsubai, Shtwai; Ashraf, Imran; Al Hejaili, Abdullah
    Description

    Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.

  4. Student Performance Prediction

    • kaggle.com
    Updated Dec 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henry Shan (2023). Student Performance Prediction [Dataset]. https://www.kaggle.com/datasets/henryshan/student-performance-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Henry Shan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Welcome! 🥳

    👏 Upvote this dataset if you find it interesting !

    This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features and it was collected by using school reports and questionnaires.

    Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat.csv) and Portuguese language (por.csv). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks.

    Dataset Description

    Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:

    VariableDescription
    schoolstudent's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
    sexstudent's sex (binary: 'F' - female or 'M' - male)
    agestudent's age (numeric: from 15 to 22)
    addressstudent's home address type (binary: 'U' - urban or 'R' - rural)
    famsizefamily size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
    Pstatusparent's cohabitation status (binary: 'T' - living together or 'A' - apart)
    Medumother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    Fedufather's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    Mjobmother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    Fjobfather's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    reasonreason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
    guardianstudent's guardian (nominal: 'mother', 'father' or 'other')
    traveltimehome to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
    studytimeweekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
    failuresnumber of past class failures (numeric: n if 1<=n<3, else 4)
    schoolsupextra educational support (binary: yes or no)
    famsupfamily educational support (binary: yes or no)
    paidextra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
    activitiesextra-curricular activities (binary: yes or no)
    nurseryattended nursery school (binary: yes or no)
    higherwants to take higher education (binary: yes or no)
    internetInternet access at home (binary: yes or no)
    romanticwith a romantic relationship (binary: yes or no)
    famrelquality of family relationships (numeric: from 1 - very bad to 5 - excellent)
    freetimefree time after school (numeric: from 1 - very low to 5 - very high)
    gooutgoing out with friends (numeric: from 1 - very low to 5 - very high)
    Dalcworkday alcohol consumption (numeric: from 1 - very low to 5 - very high)
    Walcweekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
    healthcurrent health status (numeric: from 1 - very bad to 5 - very good)
    absencesnumber of school absences (numeric: from 0 to 93)

    these grades are related with the course subject, Math or Portuguese:

    G1 - first period grade (numeric: from 0 to 20) G2 - second period grade (numeric: from 0 to 20) G3 - final grade (numeric: from 0 to 20, output target)

  5. Student Exam Performance Prediction

    • kaggle.com
    zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MrSimple (2024). Student Exam Performance Prediction [Dataset]. https://www.kaggle.com/datasets/mrsimple07/student-exam-performance-prediction
    Explore at:
    zip(19784 bytes)Available download formats
    Dataset updated
    Feb 14, 2024
    Authors
    MrSimple
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The dataset is designed for predicting whether a student will pass or fail an exam based on the number of study hours and their scores in the previous exam.

    Features: Study Hours (numeric): Represents the number of hours a student spent studying for the upcoming exam. Previous Exam Score (numeric): Indicates the student's score in the previous exam. Pass/Fail (binary): The target variable, where 1 represents a pass and 0 represents a fail in the current exam.

    **Description: **

    Features: Study Hours (numeric): Represents the number of hours a student spent studying for the upcoming exam. Previous Exam Score (numeric): Indicates the student's score in the previous exam. Pass/Fail (binary): The target variable, where 1 represents a pass and 0 represents a fail in the current exam. Dataset Size: The dataset consists of data for 500 students, ensuring a diverse range of study patterns and previous exam performances.

  6. H

    Student Performance Prediction Data set

    • dataverse.harvard.edu
    Updated May 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ephrem Admasu Yekun (2020). Student Performance Prediction Data set [Dataset]. http://doi.org/10.7910/DVN/WHBU4P
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Ephrem Admasu Yekun
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We use the data set for training, validation, and testing of high school students performance prediction. We use the data set for training, validation, and testing of high school students performance prediction.

  7. m

    Iraqi Student Performance Prediction

    • data.mendeley.com
    Updated Dec 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    saja taha (2018). Iraqi Student Performance Prediction [Dataset]. http://doi.org/10.17632/smgx6s5pwr.1
    Explore at:
    Dataset updated
    Dec 12, 2018
    Authors
    saja taha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iraq
    Description

    Iraqi dataset is collected through applying (or submitting) questionnaire in three Iraqi secondary schools for both applicable and biology branches of the final stage during the second semester of the 2018 year. Initially, the questionnaire contains 56 questions in three A4 sheets and it is answered by 250 students (samples). Latter, 130 samples are discarded due to lack of information since pre-processing is applied to obtain the most complete information of students. After removing inconsistencies and incompleteness in the dataset, this study considers 120 samples instances with 55 features for experiment purposes. The features are distributed into five main categories: Demographic, Economic, Educational, Time, and Marks. Table (1) shows the dataset’s attributes/features and their description. As illustrated in this table, new features are introduced, such as holiday and worrying effects. The relationships between parents with schools and use of books and references by the student are also considered.

  8. c

    Student Performance (Multiple Linear Regression) Dataset

    • cubig.ai
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

    2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

    (2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.

  9. Attribute information for student performance data set.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yinqiu Song; Xianqiu Meng; Jianhua Jiang (2023). Attribute information for student performance data set. [Dataset]. http://doi.org/10.1371/journal.pone.0276943.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yinqiu Song; Xianqiu Meng; Jianhua Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Attribute information for student performance data set.

  10. c

    Student Performance Dataset

    • cubig.ai
    zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance Dataset [Dataset]. https://cubig.ai/store/products/358/student-performance-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.

    2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.

  11. Student performance dataset feature description.

    • plos.figshare.com
    xls
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianwei Dong; Ruishuang Sun; Zhipeng Yan; Meilun Shi; Xinyu Bi (2025). Student performance dataset feature description. [Dataset]. http://doi.org/10.1371/journal.pone.0325713.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jianwei Dong; Ruishuang Sun; Zhipeng Yan; Meilun Shi; Xinyu Bi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Academic achievement is an important index to measure the quality of education and students’ learning outcomes. Reasonable and accurate prediction of academic achievement can help improve teachers’ educational methods. And it also provides corresponding data support for the formulation of education policies. However, traditional methods for classifying academic performance have many problems, such as low accuracy, limited ability to handle nonlinear relationships, and poor handling of data sparsity. Based on this, our study analyzes various characteristics of students, including personal information, academic performance, attendance rate, family background, extracurricular activities and etc. Our work offers a comprehensive view to understand the various factors affecting students’ academic performance. In order to improve the accuracy and robustness of student performance classification, we adopted Gaussian Distribution based Data Augmentation technique (GDO), combined with multiple Deep Learning (DL) and Machine Learning (ML) models. We explored the application of different Machine Learning and Deep Learning models in classifying student grades. And different feature combinations and data augmentation techniques were used to evaluate the performance of multiple models in classification tasks. In addition, we also checked the synthetic data’s effectiveness with variance homogeneity and P-values, and studied how the oversampling rate affects actual classification results. Research has shown that the RBFN model based on educational habit features performs the best after using GDO data augmentation. The accuracy rate is 94.12%, and the F1 score is 94.46%. These results provide valuable references for the classification of student grades and the development of intervention strategies. New methods and perspectives in the field of educational data analysis are proposed in our study. At the same time, it has also promoted innovation and development in the intelligence of the education system.

  12. Student Performance Data for Predictive Analysis

    • kaggle.com
    zip
    Updated Jul 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    usman141 (2024). Student Performance Data for Predictive Analysis [Dataset]. https://www.kaggle.com/datasets/usman141/student-performance-data-for-predictive-analysis
    Explore at:
    zip(97560 bytes)Available download formats
    Dataset updated
    Jul 14, 2024
    Authors
    usman141
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset contains detailed information about 70+ students, focusing on predicting their 5th semester performance based on their performances in previous semesters. It encompasses various aspects such as demographics, educational data, predicting variables, and environmental factors, making it ideal for educational research and predictive modeling.

    Content

    The dataset is divided into the following sections:

    • Demographics Data Contains demographic information about the students, including age, gender, and other personal details.
    • Educational Data Includes academic data from previous semesters, grades, attendance, and other relevant academic information.
    • Predicting Variables Encompasses factors that might influence 5th semester performance, including prior academic performance and other predictive metrics.
    • Environmental Factors Covers environmental influences, study habits, personal preferences, and other factors that could affect students' education and performance.

      Usage

      This dataset can be used for:

    • Predictive modeling of 5th semester student performance

    • Analyzing the impact of past academic performance on future outcomes

    • Educational research to identify key factors influencing academic success

      Acknowledgements

      Please credit the original source of the data if you use this dataset in your research or project.

    Inspiration

    Use this dataset to build models predicting student performance in the 5th semester based on their performance in previous semesters, explore correlations between various factors and academic success, or conduct comprehensive educational analysis.

  13. Data from: S1 Dataset -

    • plos.figshare.com
    txt
    Updated Jan 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Wang; Yun Yu (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0299018.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jing Wang; Yun Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Student performance is crucial for addressing learning process problems and is also an important factor in measuring learning outcomes. The ability to improve educational systems using data knowledge has driven the development of the field of educational data mining research. Here, this paper proposes a machine learning method for the prediction of student performance based on online learning. The critical thought is that eleven learning behavioral indicators are constructed according to online learning process, following that, through analyzing the correlation between the eleven learning behavioral indicators and the scores obtained by students online learning, we filter out those learning behavioral indicators that are weakly correlated with student scores, meanwhile, retain these learning behavior indicators being strongly correlated with student scores, which are used as the eigenvalue indicators. Finally, using the eigenvalue indicators to train the proposed logistic regress model with Taylor expansion. Experimental results show that the proposed logistic regress model defeats against the comparative models in prediction ability. Results also indicate that there is a significant dependency between students’ initiative in learning and learning duration, nevertheless, learning duration has a significant effect on the prediction of student performance.

  14. Student Performance Prediction Dataset

    • kaggle.com
    zip
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARYAN23wer (2025). Student Performance Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/aryan23wer/student-performance-prediction-dataset/discussion
    Explore at:
    zip(5553 bytes)Available download formats
    Dataset updated
    Apr 7, 2025
    Authors
    ARYAN23wer
    Description

    Introduction: The "Student Performance Prediction Dataset" is a valuable resource for anyone interested in understanding and improving student academic performance. Its rich collection of features enables the development of predictive models and the exploration of factors that contribute to student success. By harnessing the insights from this dataset, educators and policymakers can work towards enhancing the educational experience and outcomes of students.

    It is a comprehensive collection of data designed to facilitate predictive analysis related to student academic performance. This dataset is invaluable for educators, researchers, and data scientists seeking to gain insights into factors that influence student success. It can be used to build predictive models and identify key indicators that affect student performance, ultimately helping institutions and individuals make informed decisions to improve educational outcomes.

    Description of Attributes in the "Student Performance Prediction Dataset":

    1. STUDENT ID: A unique identifier assigned to each student in the dataset, enabling individual tracking and analysis.

    2. Student_Age: The age of the student at the time of data collection, providing insight into the age distribution of the student population.

    3. Sex: The gender of the student, typically categorized as male or female, which can be used for gender-based analysis of academic performance.

    4. Graduated High-School Type: Describes the type of high school from which the student graduated, such as public, private, or specialized institutions. This attribute can help assess the influence of high-school background on student performance.

    5. Scholarship Type: Indicates whether the student received any scholarships or financial aid for their education, which can be relevant for assessing the impact of financial support on academic outcomes.

    6. Additional Work: Represents whether the student is involved in any additional work or part-time employment alongside their studies, which may impact study hours and overall performance.

    7. Sports Activity: Indicates whether the student participates in sports activities, which can provide insights into the relationship between physical activities and academic performance.

    8. Transportation: Describes the mode of transportation used by the student to commute to school, which may have implications for punctuality and attendance.

    9. Weekly Study Hours: The number of hours per week that the student dedicates to studying, reflecting their study habits and potential commitment to academics.

    10. Attendance: Reflects the student's attendance record, which can be an essential factor in determining their engagement and participation in classes.

    11. Reading: Represents the student's performance or scores in reading-related subjects or assessments.

    12. Notes: Reflects the student's performance or scores in note-taking-related subjects or assessments.

    13. Listening in Class: Represents the student's performance or scores in class listening-related subjects or assessments.

    14. Project Work: Indicates whether the student is involved in project work or assignments, which can be a significant component of their overall grade.

    15. Grade: The final grade or academic performance measure for each student, which serves as the predictive target for analysis. This is often the attribute of interest for building predictive models.

    These attributes collectively provide a holistic view of each student's academic and personal profile, making the dataset suitable for various analyses, including predictive modeling, identifying influential factors, and assessing the impact of different variables on student performance. Researchers and educators can leverage this dataset to gain insights into the complex interplay of factors that contribute to students' academic success or challenges.

  15. Performance of machine learning models using SMOTE-balanced dataset.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf (2023). Performance of machine learning models using SMOTE-balanced dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0293061.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of machine learning models using SMOTE-balanced dataset.

  16. f

    Data Sheet 1_The application of machine learning in predicting student...

    • figshare.com
    docx
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asset Turkmenbayev; Elmira Abdykerimova; Shynggys Nurgozhayev; Guldana Karabassova; Dametken Baigozhanova (2025). Data Sheet 1_The application of machine learning in predicting student performance in university engineering programs: a rapid review.docx [Dataset]. http://doi.org/10.3389/feduc.2025.1562586.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Sep 23, 2025
    Dataset provided by
    Frontiers
    Authors
    Asset Turkmenbayev; Elmira Abdykerimova; Shynggys Nurgozhayev; Guldana Karabassova; Dametken Baigozhanova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIn recent years, the application of machine learning (ML) to predict student performance in engineering education has expanded significantly, yet questions remain about the consistency, reliability, and generalisability of these predictive models.ObjectiveThis rapid review aimed to systematically examine peer-reviewed studies published between January 1, 2019, and December 31, 2024, that applied machine learning (ML), artificial intelligence (AI), or deep learning (DL) methods to predict or improve academic outcomes in university engineering programs.MethodsWe searched IEEE Xplore, SpringerLink, and PubMed, identifying an initial pool of 2,933 records. After screening for eligibility based on pre-defined inclusion criteria, we selected 27 peer-reviewed studies for narrative synthesis and assessed their methodological quality using the PROBAST framework.ResultsAll 27 studies involved undergraduate engineering students and demonstrated the capability of diverse ML techniques to enhance various academic outcomes. Notably, one study found that a reinforcement learning-based intelligent tutoring system significantly improved learning efficiency in digital logic courses. Another study using AI-based real-time behavior analysis increased students’ exam scores by approximately 8.44 percentage points. An optimised support vector machine (SVM) model accurately predicted engineering students’ employability with 87.8% accuracy, outperforming traditional predictive approaches. Additionally, a longitudinally validated SVM model effectively identified at-risk students, achieving 83.9% accuracy on hold-out cohorts. Bayesian regression methods also improved early-term course grade prediction by 27% over baseline predictors. However, most studies relied on single-institution samples and lacked rigorous external validation, limiting the generalisability of their findings.ConclusionThe evidence confirms that ML methods—particularly reinforcement learning, deep learning, and optimised predictive algorithms—can substantially improve student performance and academic outcomes in engineering education. However, methodological shortcomings related to participant selection bias, sample sizes, validation practices, and transparency in reporting require further attention. Future research should prioritise multi-institutional studies, robust validation techniques, and enhanced methodological transparency to fully leverage ML’s potential in engineering education.

  17. Students’ performance research in recent years.

    • plos.figshare.com
    xls
    Updated Mar 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huan Xu; Min Kim (2024). Students’ performance research in recent years. [Dataset]. http://doi.org/10.1371/journal.pone.0300010.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 11, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Huan Xu; Min Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Students’ performance is an important factor for the evaluation of teaching quality in colleges. The prediction and analysis of students’ performance can guide students’ learning in time. Aiming at the low accuracy problem of single model in students’ performance prediction, a combination prediction method is put forward based on ant colony algorithm. First, considering the characteristics of students’ learning behavior and the characteristics of the models, decision tree (DT), support vector regression (SVR) and BP neural network (BP) are selected to establish three prediction models. Then, an ant colony algorithm (ACO) is proposed to calculate the weight of each model of the combination prediction model. The combination prediction method was compared with the single Machine learning (ML) models and other methods in terms of accuracy and running time. The combination prediction model with mean square error (MSE) of 0.0089 has higher performance than DT with MSE of 0.0326, SVR with MSE of 0.0229 and BP with MSE of 0.0148. To investigate the efficacy of the combination prediction model, other prediction models are used for a comparative study. The combination prediction model with MSE of 0.0089 has higher performance than GS-XGBoost with MSE of 0.0131, PSO-SVR with MSE of 0.0117 and IDA-SVR with MSE of 0.0092. Meanwhile, the running speed of the combination prediction model is also faster than the above three methods.

  18. f

    Detailed description of the dataset.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ishaq, Abid; Abuzinadah, Nihal; Mohamed, Abdullah; Umer, Muhammad; Eshmawi, Ala’ Abdulmajid; Ashraf, Imran; Alsubai, Shtwai; Al Hejaili, Abdullah (2023). Detailed description of the dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000971130
    Explore at:
    Dataset updated
    Nov 8, 2023
    Authors
    Ishaq, Abid; Abuzinadah, Nihal; Mohamed, Abdullah; Umer, Muhammad; Eshmawi, Ala’ Abdulmajid; Ashraf, Imran; Alsubai, Shtwai; Al Hejaili, Abdullah
    Description

    Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.

  19. m

    Data from: Dataset of Student Level Prediction in UAE

    • data.mendeley.com
    Updated Dec 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shatha Ghareeb (2020). Dataset of Student Level Prediction in UAE [Dataset]. http://doi.org/10.17632/3g8dtwbjjy.1
    Explore at:
    Dataset updated
    Dec 18, 2020
    Authors
    shatha Ghareeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Arab Emirates
    Description

    The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.

  20. m

    Data from: Student grade prediction dataset

    • data.mendeley.com
    Updated Jun 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nonso Nnamoko (2022). Student grade prediction dataset [Dataset]. http://doi.org/10.17632/wf8568hxb7.1
    Explore at:
    Dataset updated
    Jun 16, 2022
    Authors
    Nonso Nnamoko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset provides a collection of 160 instances belonging to two classes (pass' = 136 andfail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON).

    This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amr Maree (2025). Student Performance Prediction [Dataset]. https://www.kaggle.com/datasets/amrmaree/student-performance-prediction
Organization logo

Student Performance Prediction

📚 Predict Student Exam Scores Based on Study Habits & Attendance

Explore at:
zip(10981 bytes)Available download formats
Dataset updated
Mar 3, 2025
Authors
Amr Maree
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Student Performance Prediction Dataset 🎓📊

Overview

This dataset contains information about students' academic performance, study habits, and external factors affecting their final exam scores. It is designed for predictive modeling, data visualization, and educational analytics.

Dataset Purpose

This dataset is useful for:
- Predicting student final exam scores 📈
- Identifying key factors that impact academic performance 🎯
- Exploring feature importance in education-related datasets 📊
- Building machine learning models for regression and classification 🤖

Columns Description

Column NameDescription
Student_IDUnique identifier for each student.
GenderGender of the student (Male/Female).
Study_Hours_per_WeekAverage number of study hours per week.
Attendance_RateAttendance percentage (50% - 100%).
Past_Exam_ScoresAverage score of previous exams (50 - 100).
Parental_Education_LevelEducation level of parents (High School, Bachelors, Masters, PhD).
Internet_Access_at_HomeWhether the student has internet access at home (Yes/No).
Extracurricular_ActivitiesWhether the student participates in extracurricular activities (Yes/No).
Final_Exam_Score (Target)The final exam score of the student (50 - 100, integer values).
Pass_Fail (Target)The student status (Pass/Fail).

Ideas for Notebooks 📑

  1. Regression Analysis – Predict final exam scores using machine learning models (Linear Regression, Random Forest, XGBoost).
  2. Feature Importance – Analyze which factors contribute the most to student performance.
  3. Exploratory Data Analysis (EDA) – Visualize the impact of study hours, attendance, and other features.
  4. Classification – Convert scores into categories (Pass/Fail, A/B/C/D) and build classification models.

License & Usage

This dataset is open for public use. Feel free to use it for learning, research, and model-building! 🚀

Search
Clear search
Close search
Google apps
Main menu