Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information about students' academic performance, study habits, and external factors affecting their final exam scores. It is designed for predictive modeling, data visualization, and educational analytics.
This dataset is useful for:
- Predicting student final exam scores 📈
- Identifying key factors that impact academic performance 🎯
- Exploring feature importance in education-related datasets 📊
- Building machine learning models for regression and classification 🤖
| Column Name | Description |
|---|---|
| Student_ID | Unique identifier for each student. |
| Gender | Gender of the student (Male/Female). |
| Study_Hours_per_Week | Average number of study hours per week. |
| Attendance_Rate | Attendance percentage (50% - 100%). |
| Past_Exam_Scores | Average score of previous exams (50 - 100). |
| Parental_Education_Level | Education level of parents (High School, Bachelors, Masters, PhD). |
| Internet_Access_at_Home | Whether the student has internet access at home (Yes/No). |
| Extracurricular_Activities | Whether the student participates in extracurricular activities (Yes/No). |
| Final_Exam_Score (Target) | The final exam score of the student (50 - 100, integer values). |
| Pass_Fail (Target) | The student status (Pass/Fail). |
This dataset is open for public use. Feel free to use it for learning, research, and model-building! 🚀
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed for practicing classification tasks, specifically predicting whether a student will pass or fail a course based on various academic and demographic factors. It contains 40,000 records of students, with attributes including study habits, attendance rates, previous grades, and more. The dataset also introduces challenges such as missing values, incorrect data, and noise, making it ideal for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.
Facebook
TwitterPredicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
👏 Upvote this dataset if you find it interesting !
This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features and it was collected by using school reports and questionnaires.
Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat.csv) and Portuguese language (por.csv). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks.
| Variable | Description |
|---|---|
school | student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira) |
sex | student's sex (binary: 'F' - female or 'M' - male) |
age | student's age (numeric: from 15 to 22) |
address | student's home address type (binary: 'U' - urban or 'R' - rural) |
famsize | family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) |
Pstatus | parent's cohabitation status (binary: 'T' - living together or 'A' - apart) |
Medu | mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) |
Fedu | father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) |
Mjob | mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') |
Fjob | father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') |
reason | reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') |
guardian | student's guardian (nominal: 'mother', 'father' or 'other') |
traveltime | home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) |
studytime | weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) |
failures | number of past class failures (numeric: n if 1<=n<3, else 4) |
schoolsup | extra educational support (binary: yes or no) |
famsup | family educational support (binary: yes or no) |
paid | extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) |
activities | extra-curricular activities (binary: yes or no) |
nursery | attended nursery school (binary: yes or no) |
higher | wants to take higher education (binary: yes or no) |
internet | Internet access at home (binary: yes or no) |
romantic | with a romantic relationship (binary: yes or no) |
famrel | quality of family relationships (numeric: from 1 - very bad to 5 - excellent) |
freetime | free time after school (numeric: from 1 - very low to 5 - very high) |
goout | going out with friends (numeric: from 1 - very low to 5 - very high) |
Dalc | workday alcohol consumption (numeric: from 1 - very low to 5 - very high) |
Walc | weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) |
health | current health status (numeric: from 1 - very bad to 5 - very good) |
absences | number of school absences (numeric: from 0 to 93) |
G1 - first period grade (numeric: from 0 to 20)
G2 - second period grade (numeric: from 0 to 20)
G3 - final grade (numeric: from 0 to 20, output target)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset is designed for predicting whether a student will pass or fail an exam based on the number of study hours and their scores in the previous exam.
Features: Study Hours (numeric): Represents the number of hours a student spent studying for the upcoming exam. Previous Exam Score (numeric): Indicates the student's score in the previous exam. Pass/Fail (binary): The target variable, where 1 represents a pass and 0 represents a fail in the current exam.
**Description: **
Features: Study Hours (numeric): Represents the number of hours a student spent studying for the upcoming exam. Previous Exam Score (numeric): Indicates the student's score in the previous exam. Pass/Fail (binary): The target variable, where 1 represents a pass and 0 represents a fail in the current exam. Dataset Size: The dataset consists of data for 500 students, ensuring a diverse range of study patterns and previous exam performances.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We use the data set for training, validation, and testing of high school students performance prediction. We use the data set for training, validation, and testing of high school students performance prediction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Iraqi dataset is collected through applying (or submitting) questionnaire in three Iraqi secondary schools for both applicable and biology branches of the final stage during the second semester of the 2018 year. Initially, the questionnaire contains 56 questions in three A4 sheets and it is answered by 250 students (samples). Latter, 130 samples are discarded due to lack of information since pre-processing is applied to obtain the most complete information of students. After removing inconsistencies and incompleteness in the dataset, this study considers 120 samples instances with 55 features for experiment purposes. The features are distributed into five main categories: Demographic, Economic, Educational, Time, and Marks. Table (1) shows the dataset’s attributes/features and their description. As illustrated in this table, new features are introduced, such as holiday and worrying effects. The relationships between parents with schools and use of books and references by the student are also considered.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.
2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.
(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribute information for student performance data set.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.
2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Academic achievement is an important index to measure the quality of education and students’ learning outcomes. Reasonable and accurate prediction of academic achievement can help improve teachers’ educational methods. And it also provides corresponding data support for the formulation of education policies. However, traditional methods for classifying academic performance have many problems, such as low accuracy, limited ability to handle nonlinear relationships, and poor handling of data sparsity. Based on this, our study analyzes various characteristics of students, including personal information, academic performance, attendance rate, family background, extracurricular activities and etc. Our work offers a comprehensive view to understand the various factors affecting students’ academic performance. In order to improve the accuracy and robustness of student performance classification, we adopted Gaussian Distribution based Data Augmentation technique (GDO), combined with multiple Deep Learning (DL) and Machine Learning (ML) models. We explored the application of different Machine Learning and Deep Learning models in classifying student grades. And different feature combinations and data augmentation techniques were used to evaluate the performance of multiple models in classification tasks. In addition, we also checked the synthetic data’s effectiveness with variance homogeneity and P-values, and studied how the oversampling rate affects actual classification results. Research has shown that the RBFN model based on educational habit features performs the best after using GDO data augmentation. The accuracy rate is 94.12%, and the F1 score is 94.46%. These results provide valuable references for the classification of student grades and the development of intervention strategies. New methods and perspectives in the field of educational data analysis are proposed in our study. At the same time, it has also promoted innovation and development in the intelligence of the education system.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains detailed information about 70+ students, focusing on predicting their 5th semester performance based on their performances in previous semesters. It encompasses various aspects such as demographics, educational data, predicting variables, and environmental factors, making it ideal for educational research and predictive modeling.
The dataset is divided into the following sections:
Environmental Factors Covers environmental influences, study habits, personal preferences, and other factors that could affect students' education and performance.
This dataset can be used for:
Predictive modeling of 5th semester student performance
Analyzing the impact of past academic performance on future outcomes
Educational research to identify key factors influencing academic success
Please credit the original source of the data if you use this dataset in your research or project.
Use this dataset to build models predicting student performance in the 5th semester based on their performance in previous semesters, explore correlations between various factors and academic success, or conduct comprehensive educational analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Student performance is crucial for addressing learning process problems and is also an important factor in measuring learning outcomes. The ability to improve educational systems using data knowledge has driven the development of the field of educational data mining research. Here, this paper proposes a machine learning method for the prediction of student performance based on online learning. The critical thought is that eleven learning behavioral indicators are constructed according to online learning process, following that, through analyzing the correlation between the eleven learning behavioral indicators and the scores obtained by students online learning, we filter out those learning behavioral indicators that are weakly correlated with student scores, meanwhile, retain these learning behavior indicators being strongly correlated with student scores, which are used as the eigenvalue indicators. Finally, using the eigenvalue indicators to train the proposed logistic regress model with Taylor expansion. Experimental results show that the proposed logistic regress model defeats against the comparative models in prediction ability. Results also indicate that there is a significant dependency between students’ initiative in learning and learning duration, nevertheless, learning duration has a significant effect on the prediction of student performance.
Facebook
TwitterIntroduction: The "Student Performance Prediction Dataset" is a valuable resource for anyone interested in understanding and improving student academic performance. Its rich collection of features enables the development of predictive models and the exploration of factors that contribute to student success. By harnessing the insights from this dataset, educators and policymakers can work towards enhancing the educational experience and outcomes of students.
It is a comprehensive collection of data designed to facilitate predictive analysis related to student academic performance. This dataset is invaluable for educators, researchers, and data scientists seeking to gain insights into factors that influence student success. It can be used to build predictive models and identify key indicators that affect student performance, ultimately helping institutions and individuals make informed decisions to improve educational outcomes.
Description of Attributes in the "Student Performance Prediction Dataset":
STUDENT ID: A unique identifier assigned to each student in the dataset, enabling individual tracking and analysis.
Student_Age: The age of the student at the time of data collection, providing insight into the age distribution of the student population.
Sex: The gender of the student, typically categorized as male or female, which can be used for gender-based analysis of academic performance.
Graduated High-School Type: Describes the type of high school from which the student graduated, such as public, private, or specialized institutions. This attribute can help assess the influence of high-school background on student performance.
Scholarship Type: Indicates whether the student received any scholarships or financial aid for their education, which can be relevant for assessing the impact of financial support on academic outcomes.
Additional Work: Represents whether the student is involved in any additional work or part-time employment alongside their studies, which may impact study hours and overall performance.
Sports Activity: Indicates whether the student participates in sports activities, which can provide insights into the relationship between physical activities and academic performance.
Transportation: Describes the mode of transportation used by the student to commute to school, which may have implications for punctuality and attendance.
Weekly Study Hours: The number of hours per week that the student dedicates to studying, reflecting their study habits and potential commitment to academics.
Attendance: Reflects the student's attendance record, which can be an essential factor in determining their engagement and participation in classes.
Reading: Represents the student's performance or scores in reading-related subjects or assessments.
Notes: Reflects the student's performance or scores in note-taking-related subjects or assessments.
Listening in Class: Represents the student's performance or scores in class listening-related subjects or assessments.
Project Work: Indicates whether the student is involved in project work or assignments, which can be a significant component of their overall grade.
Grade: The final grade or academic performance measure for each student, which serves as the predictive target for analysis. This is often the attribute of interest for building predictive models.
These attributes collectively provide a holistic view of each student's academic and personal profile, making the dataset suitable for various analyses, including predictive modeling, identifying influential factors, and assessing the impact of different variables on student performance. Researchers and educators can leverage this dataset to gain insights into the complex interplay of factors that contribute to students' academic success or challenges.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of machine learning models using SMOTE-balanced dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIn recent years, the application of machine learning (ML) to predict student performance in engineering education has expanded significantly, yet questions remain about the consistency, reliability, and generalisability of these predictive models.ObjectiveThis rapid review aimed to systematically examine peer-reviewed studies published between January 1, 2019, and December 31, 2024, that applied machine learning (ML), artificial intelligence (AI), or deep learning (DL) methods to predict or improve academic outcomes in university engineering programs.MethodsWe searched IEEE Xplore, SpringerLink, and PubMed, identifying an initial pool of 2,933 records. After screening for eligibility based on pre-defined inclusion criteria, we selected 27 peer-reviewed studies for narrative synthesis and assessed their methodological quality using the PROBAST framework.ResultsAll 27 studies involved undergraduate engineering students and demonstrated the capability of diverse ML techniques to enhance various academic outcomes. Notably, one study found that a reinforcement learning-based intelligent tutoring system significantly improved learning efficiency in digital logic courses. Another study using AI-based real-time behavior analysis increased students’ exam scores by approximately 8.44 percentage points. An optimised support vector machine (SVM) model accurately predicted engineering students’ employability with 87.8% accuracy, outperforming traditional predictive approaches. Additionally, a longitudinally validated SVM model effectively identified at-risk students, achieving 83.9% accuracy on hold-out cohorts. Bayesian regression methods also improved early-term course grade prediction by 27% over baseline predictors. However, most studies relied on single-institution samples and lacked rigorous external validation, limiting the generalisability of their findings.ConclusionThe evidence confirms that ML methods—particularly reinforcement learning, deep learning, and optimised predictive algorithms—can substantially improve student performance and academic outcomes in engineering education. However, methodological shortcomings related to participant selection bias, sample sizes, validation practices, and transparency in reporting require further attention. Future research should prioritise multi-institutional studies, robust validation techniques, and enhanced methodological transparency to fully leverage ML’s potential in engineering education.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Students’ performance is an important factor for the evaluation of teaching quality in colleges. The prediction and analysis of students’ performance can guide students’ learning in time. Aiming at the low accuracy problem of single model in students’ performance prediction, a combination prediction method is put forward based on ant colony algorithm. First, considering the characteristics of students’ learning behavior and the characteristics of the models, decision tree (DT), support vector regression (SVR) and BP neural network (BP) are selected to establish three prediction models. Then, an ant colony algorithm (ACO) is proposed to calculate the weight of each model of the combination prediction model. The combination prediction method was compared with the single Machine learning (ML) models and other methods in terms of accuracy and running time. The combination prediction model with mean square error (MSE) of 0.0089 has higher performance than DT with MSE of 0.0326, SVR with MSE of 0.0229 and BP with MSE of 0.0148. To investigate the efficacy of the combination prediction model, other prediction models are used for a comparative study. The combination prediction model with MSE of 0.0089 has higher performance than GS-XGBoost with MSE of 0.0131, PSO-SVR with MSE of 0.0117 and IDA-SVR with MSE of 0.0092. Meanwhile, the running speed of the combination prediction model is also faster than the above three methods.
Facebook
TwitterPredicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides a collection of 160 instances belonging to two classes (pass' = 136 andfail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON).
This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information about students' academic performance, study habits, and external factors affecting their final exam scores. It is designed for predictive modeling, data visualization, and educational analytics.
This dataset is useful for:
- Predicting student final exam scores 📈
- Identifying key factors that impact academic performance 🎯
- Exploring feature importance in education-related datasets 📊
- Building machine learning models for regression and classification 🤖
| Column Name | Description |
|---|---|
| Student_ID | Unique identifier for each student. |
| Gender | Gender of the student (Male/Female). |
| Study_Hours_per_Week | Average number of study hours per week. |
| Attendance_Rate | Attendance percentage (50% - 100%). |
| Past_Exam_Scores | Average score of previous exams (50 - 100). |
| Parental_Education_Level | Education level of parents (High School, Bachelors, Masters, PhD). |
| Internet_Access_at_Home | Whether the student has internet access at home (Yes/No). |
| Extracurricular_Activities | Whether the student participates in extracurricular activities (Yes/No). |
| Final_Exam_Score (Target) | The final exam score of the student (50 - 100, integer values). |
| Pass_Fail (Target) | The student status (Pass/Fail). |
This dataset is open for public use. Feel free to use it for learning, research, and model-building! 🚀