100+ datasets found

Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
f
Performance of models using CNN features.
plos.figshare.com
xls
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf (2023). Performance of models using CNN features. [Dataset]. http://doi.org/10.1371/journal.pone.0293061.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0293061.t005
Dataset updated
Nov 8, 2023
Dataset provided by
PLOS ONE
Authors
Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.
Student Performance Prediction
kaggle.com
Updated Dec 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asiya Jan001 (2021). Student Performance Prediction [Dataset]. https://www.kaggle.com/datasets/asiyajan001/student-performance-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Asiya Jan001
Description
Dataset

This dataset was created by Asiya Jan001

Released under Data files © Original Authors

Contents
o
Student Performance and Engagement Prediction in eLearning Environments...
osf.io
url
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei-Li Lin; Chang Liu; Qi Lin; ZhexuLi (2024). Student Performance and Engagement Prediction in eLearning Environments Using Machine Learning Methods [Dataset]. http://doi.org/10.17605/OSF.IO/WKUS3
Explore at:
urlAvailable download formats
Unique identifier
https://doi.org/10.17605/OSF.IO/WKUS3
Dataset updated
Oct 10, 2024
Dataset provided by
Center For Open Science
Authors
Wei-Li Lin; Chang Liu; Qi Lin; ZhexuLi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The dataset was collected from a second-year undergraduate Science course at a North American University, delivered in a blended format (combining face-to-face and online components). The raw dataset comprises an event log of 486 enrolled students, totaling 305,933 records from the university’s learning management system (LMS), OWL. Each record includes the following fields:

Event Date: Timestamp of the event.

Event Type: Action taken by the student.

Event Location: Directory where the action was taken.

Session Start: Timestamp marking the start of the online session.

Session End: Timestamp marking the end of the online session.

Student ID: Identifier to group the event log by student.

Resources and assignments were posted sequentially throughout the course, with an average duration of approximately two weeks between assignment posting and due date. The dataset was sorted first by “Student ID” and then by “Event Date” to maintain a chronological order of events for each student. Due to privacy concerns and the General Data Protection Regulation (GDPR), the raw dataset cannot be shared. Instead, it was transformed into a new dataset representing engagement metrics by calculating desired metrics from the event logs for each student. The engagement metrics were chosen to maximize the information extracted from the available data, considering the course structure, which included three assignments, one quiz, one midterm exam, and one final exam.
o
Synthetic Student Performance Dataset
opendatabay.com
.undefined
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Student Performance Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/09e2de7b-9830-4337-a801-f4b8ca312c53
Explore at:
.undefinedAvailable download formats
Dataset updated
May 6, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Education & Learning Analytics
Description
This synthetic Student Performance Dataset is designed as an educational resource for data science, machine learning, and education analytics applications. The dataset provides detailed information on various factors influencing students’ academic performance, including demographics, family background, extracurricular activities, and study habits. It aims to help users analyze relationships between these factors and students’ grades, providing insights into student success and well-being.

Dataset Features:

Gender: Gender of the student (e.g., "Male," "Female").

Age: Age of the student (in years).

Family Size: Size of the student’s family.

Parental Status (Together/Apart): Whether the parents are living together or apart.

Mother's Education Level: Education level of the student’s mother.

Father's Education Level: Education level of the student’s father.

Mother's Job: Occupation of the student’s mother.

Father's Job: Occupation of the student’s father.

Reason for Choosing School: Primary reason for selecting the school (e.g., proximity, reputation).

Legal Guardian: Legal guardian of the student (e.g., "Mother," "Father," "Other").

Travel Time to School (in hours): Daily travel time between home and school.

Weekly Study Time (in hours): Hours spent studying outside school per week.

Number of Past Failures: Number of previously failed subjects.

Extra Educational Support: Whether the student receives additional educational support (e.g., "Yes," "No").

Family Educational Support: Whether the family provides educational support (e.g., "Yes," "No").

Paid Extra Classes: Whether the student takes extra paid classes (e.g., "Yes," "No").

Extracurricular Activities: Participation in extracurricular activities (e.g., "Yes," "No").

Attended Nursery School: Whether the student attended nursery school (e.g., "Yes," "No").

Aspiration for Higher Education: Whether the student aspires to pursue higher education (e.g., "Yes," "No").

Internet Access at Home: Availability of internet access at home (e.g., "Yes," "No").

In a Romantic Relationship: Whether the student is in a romantic relationship (e.g., "Yes," "No").

Quality of Family Relationships: Rated quality of relationships within the family.

Free Time After School: Amount of free time available after school hours.

Going Out with Friends: Frequency of going out with friends.

Workday Alcohol Consumption: Level of alcohol consumption during workdays.

Weekend Alcohol Consumption: Level of alcohol consumption during weekends.

Current Health Status: Self-reported health status of the student.

Number of School Absences: Total number of school days missed.

First Period Grade: Grade received during the first grading period.

Second Period Grade: Grade received during the second grading period.

Final Grade: Final grade achieved by the student.

Distribution:

https://storage.googleapis.com/opendatabay_public/images/image_725529a8-e4cb-4bee-bcca-a9adc2658dbd.png" alt="Student Performance Dataset Distribution">

https://storage.googleapis.com/opendatabay_public/images/image_55f1fa29-442d-49ea-89a1-e90b85d8c95f.png" alt="Student Performance Data">

Usage:

This dataset is useful for a variety of applications, including:

Student Performance Analysis: To explore relationships between family background, study habits, and academic outcomes.

Educational Research: To identify key factors influencing student success and well-being.

Predictive Modeling: To build models that predict student grades or identify students at risk of underperforming.

Policy Making: To analyze how socioeconomic factors and family structure impact education outcomes.

Coverage:

This dataset is synthetic and anonymized, ensuring that it is safe for experimentation and learning without compromising any real student data.

License:

CCO (Public Domain)

Who can use it:

Data science learners: For practising data manipulation, visualization, and predictive modelling. Educators and researchers: For academic studies or teaching purposes in student analytics and education research. Education professionals: For analyzing factors that influence student success and tailoring interventions to improve outcomes.
A
‘ Predicting Student Performance’ analyzed by Analyst-2
analyst-2.ai
Updated Mar 2, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘ Predicting Student Performance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-student-performance-ec1b/b7296868/?iid=058-803&v=presentation
Explore at:
Dataset updated
Mar 2, 2015
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

How to use this dataset

Predict Student's future performance

Understand the root causes for low performance

More datasets

Acknowledgements

If you use this dataset in your research, please credit ewenme

--- Original source retains full ownership of the source dataset ---
Student Performance Predictions
kaggle.com
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haseeb_in_Data (2024). Student Performance Predictions [Dataset]. https://www.kaggle.com/datasets/haseebindata/student-performance-predictions/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 17, 2024
Dataset provided by
Kaggle
Authors
Haseeb_in_Data
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Student Performance Dataset is designed to evaluate and predict student outcomes based on various factors that can influence academic success. This synthetic dataset includes features that are commonly considered in educational research and real-world scenarios, such as attendance, study habits, previous academic performance, and participation in extracurricular activities. The goal is to understand how these factors correlate with the final grades of students and to build a predictive model that can forecast student performance.

Dataset Features: StudentID: A unique identifier for each student. Name: The name of the student. Gender: The gender of the student (Male/Female). AttendanceRate: The percentage of classes attended by the student. StudyHoursPerWeek: The number of hours the student spends studying each week. PreviousGrade: The grade the student achieved in the previous semester (out of 100). ExtracurricularActivities: The number of extracurricular activities the student is involved in. ParentalSupport: A qualitative assessment of the level of support provided by the student's parents (High/Medium/Low). FinalGrade: The final grade of the student (out of 100), which serves as the target variable for prediction. Use Cases: Predicting Student Performance: The dataset can be used to build machine learning models that predict the final grade of students based on the other features. This can help educators identify students who may need additional support to improve their outcomes.

Exploratory Data Analysis: Researchers and data scientists can explore the relationships between different factors (like attendance or study habits) and student performance. For example, understanding whether higher attendance correlates with better grades.

Feature Importance Analysis: The dataset allows for the examination of which features are most predictive of student success, providing insights into key areas of focus for educational interventions.

Educational Interventions: By identifying patterns in the data, schools and educational institutions can implement targeted interventions to help students improve in specific areas, such as increasing study hours or encouraging participation in extracurricular activities.

Potential Insights: Correlation Between Study Habits and Performance: The dataset can be used to determine how much study time contributes to academic success.

Impact of Attendance on Grades: Analysis can reveal the extent to which regular attendance influences final grades.

Role of Extracurricular Activities: The dataset can help assess whether participation in extracurricular activities positively or negatively impacts academic performance.

Influence of Parental Support: The data allows for the examination of how different levels of parental support affect student outcomes.

Conclusion: The Student Performance Dataset is a versatile tool for educators, data scientists, and researchers interested in understanding and predicting student success. By analyzing this data, stakeholders can gain valuable insights into the factors that contribute to academic performance and develop strategies to enhance educational outcomes
student-performance-data
kaggle.com
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Azam (2025). student-performance-data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12160820
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12160820
Dataset updated
Jun 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Azam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Student Performance Data

This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.

The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.

Highlights:

Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects

File Format:

Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record

Applications

Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization
c
Student Performance Dataset
cubig.ai
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Student Performance Dataset [Dataset]. https://cubig.ai/store/products/358/student-performance-dataset
Explore at:
Dataset updated
May 28, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.

2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.
Students performance prediction data set - traditional vs. online learning
figshare.com
txt
Updated Mar 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Czibula; Maier Mariana; Zsuzsanna Onet-Marian (2021). Students performance prediction data set - traditional vs. online learning [Dataset]. http://doi.org/10.6084/m9.figshare.14330447.v5
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14330447.v5
Dataset updated
Mar 28, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Gabriela Czibula; Maier Mariana; Zsuzsanna Onet-Marian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The six data sets were created for an undergraduate course at the Babes-Bolyai University, Faculty of Mathematics and Computer Science, held for second year students in the autumn semester. The course is taught both in Romanian and English with the same content and evaluation rules in both languages. The six data sets are the following: - FirstCaseStudy_RO_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the Romanian language - FirstCaseStudy_RO_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the Romanian language - SecondCaseStudy_EN_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the English language - SecondCaseStudy_EN_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the English language - ThirdCaseStudy_Both_traditional_2019-2020.txt - the concatenation of the two data sets for the 2019-2020 academic year (so all instances from FirstCaseStudy_RO_traditional_2019-2020 and SecondCaseStudy_EN_traditional_2019-2020 together) - ThirdCaseStudy_Both_online_2020-2021.txt - the concatenation of the two data sets for the 2020-2021 academic year (so all instances from FirstCaseStudy_RO_online_2020-2021 and SecondCaseStudy_EN_online_2020-2021 together)Instances from the data sets for the 2019-2020 academic year contain 12 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - the grades received by the student for 2 practical exams. If a student did not participate in a practical exam, de grade was 0. Possible values are between 0 and 10. - the number of seminar activities that the student had. Possible values are between 0 and 7. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4Instances from the data sets for the 2020-2021 academic year contain 10 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - a seminar bonus computed based on the number of seminar activities the student had during the semester, which was added to the final grade. Possible values are between 0 and 0.5. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4
Student Performance Predictions
kaggle.com
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youssef Ayman (2024). Student Performance Predictions [Dataset]. https://www.kaggle.com/datasets/youssefayman22/student-performance-predictions/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Youssef Ayman
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Youssef Ayman

Released under Apache 2.0

Contents
m
Data from: Student grade Prediction
data.mendeley.com
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neelamcadhab Padhy (2025). Student grade Prediction [Dataset]. http://doi.org/10.17632/6dgkv6kpr2.1
Explore at:
Unique identifier
https://doi.org/10.17632/6dgkv6kpr2.1
Dataset updated
Mar 24, 2025
Authors
Neelamcadhab Padhy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains semester-wise academic performance data of BTech students from GIET University. It includes the grades of students from their 1st to 4th semesters, along with their corresponding 5th-semester grades. The dataset is intended for use in educational data mining and machine learning applications, specifically for predicting the 5th-semester grades of students based on their past performance.The dataset consists of 379 student records, with each record containing the following attributes:

SEM 1: Grade obtained in the 1st semester.

SEM 2: Grade obtained in the 2nd semester.

SEM 3: Grade obtained in the 3rd semester.

SEM 4: Grade obtained in the 4th semester.

SEM 5: Grade obtained in the 5th semester (target variable for prediction).The grades are represented on a scale of 0 to 10, where 10 is the highest achievable grade. This dataset can be used to develop predictive models for academic performance, identify trends in student performance, and support decision-making in educational institutions.

Keywords: Grade Prediction, Student Performance, Educational Data Mining, Academic Analytics, Machine Learning, GIET University

Potential Applications:

Predicting student performance in future semesters.

Identifying at-risk students for early intervention.

Analyzing trends in academic performance over time.
VN Student Performance Dataset
kaggle.com
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoàng Ngọc Tiến (2025). VN Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/hongngctin/vn-student-performance-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hoàng Ngọc Tiến
Description
This is a synthesized dataset based on real academic performance data of high school students in several schools in Vietnam. This data can be useful for analysis, training prediction models on academic performance, personalized study planning, and career counseling, among other applications.

The data used contains only anonymized and non-identifiable information collected from high school students, including demographic and academic performance attributes. No personally identifying information was collected or used. The data is used exclusively for academic research purposes under ethical guidelines, and no attempt is made to trace or analyze individual-level outcomes.
Student Performance
kaggle.com
Updated Aug 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BARKHA VERMA (2020). Student Performance [Dataset]. https://www.kaggle.com/barkhaverma/student-performance/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 30, 2020
Dataset provided by
Kaggle
Authors
BARKHA VERMA
Description
Context

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Content

Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:

1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira) 2 sex - student's sex (binary: 'F' - female or 'M' - male) 3 age - student's age (numeric: from 15 to 22) 4 address - student's home address type (binary: 'U' - urban or 'R' - rural) 5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) 6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart) 7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education) 8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education) 9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') 10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') 11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') 12 guardian - student's guardian (nominal: 'mother', 'father' or 'other') 13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) 14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) 15 failures - number of past class failures (numeric: n if 1<=n<3, else 4) 16 schoolsup - extra educational support (binary: yes or no) 17 famsup - family educational support (binary: yes or no) 18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) 19 activities - extra-curricular activities (binary: yes or no) 20 nursery - attended nursery school (binary: yes or no) 21 higher - wants to take higher education (binary: yes or no) 22 internet - Internet access at home (binary: yes or no) 23 romantic - with a romantic relationship (binary: yes or no) 24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) 25 freetime - free time after school (numeric: from 1 - very low to 5 - very high) 26 goout - going out with friends (numeric: from 1 - very low to 5 - very high) 27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high) 28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) 29 health - current health status (numeric: from 1 - very bad to 5 - very good) 30 absences - number of school absences (numeric: from 0 to 93)

these grades are related with the course subject, Math or Portuguese:

31 G1 - first period grade (numeric: from 0 to 20) 31 G2 - second period grade (numeric: from 0 to 20) 32 G3 - final grade (numeric: from 0 to 20, output target)

Source

Paulo Cortez, University of Minho, GuimarÃ£es, Portugal, http://www3.dsi.uminho.pt/pcortez

Citation Request:

Please include this citation if you plan to use this database:

P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. [Web Link]
c
Student Performance (Multiple Linear Regression) Dataset
cubig.ai
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
m
Data from: Dataset of Student Level Prediction in UAE
data.mendeley.com
Updated Dec 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shatha Ghareeb (2020). Dataset of Student Level Prediction in UAE [Dataset]. http://doi.org/10.17632/3g8dtwbjjy.1
Explore at:
Unique identifier
https://doi.org/10.17632/3g8dtwbjjy.1
Dataset updated
Dec 18, 2020
Authors
shatha Ghareeb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Arab Emirates
Description
The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.
o
Synthetic Student Profiles with Academic Outcomes Dataset
opendatabay.com
.undefined
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Student Profiles with Academic Outcomes Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/41933042-6ec7-49c4-b151-508fc8f5592b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Education & Learning Analytics
Description
The Synthetic Student Performance Dataset is designed to support research, analytics, and educational projects focused on academic performance, family background, and behavioral factors affecting students. It mirrors real-world educational data and offers diverse features to explore student success patterns.

Dataset Features

student_id: Unique identifier for each student.

school: Attended school (e.g., GP or MS).

sex: Gender of the student (F/M).

age: Student's age in years.

address_type: Urban or Rural home location.

family_size: Family size (Less than or equal to 3 / Greater than 3).

parent_status: Parental cohabitation status (Living together / Apart).

mother_education / father_education: Highest education level completed (e.g., Primary, Secondary, Higher).

mother_job / father_job: Occupation of the student's parents.

school_choice_reason: Reason for choosing the school (e.g., Reputation, Proximity).

guardian: Primary caregiver (e.g., Mother, Father, Other).

travel_time: Daily travel time to school.

study_time: Weekly study time outside school.

class_failures: Number of past class failures.

school_support / family_support: Extra academic support received at school and from family (Yes/No).

extra_paid_classes: Attending paid private tutoring (Yes/No).

activities: Participation in extracurricular activities (Yes/No).

nursery_school: Attended preschool (Yes/No).

higher_ed: Desire to pursue higher education (Yes/No).

internet_access: Access to the internet at home (Yes/No).

romantic_relationship: Currently in a romantic relationship (Yes/No).

family_relationship: Quality of family relationships (numeric scale).

free_time: Amount of free time after school (numeric scale).

social: Frequency of social activities with peers (numeric scale).

weekday_alcohol / weekend_alcohol: Alcohol consumption levels on weekdays and weekends.

health: Current health status (1–5 scale).

absences: Number of school absences.

grade_1 / grade_2 / final_grade: First and second period grades and final academic performance.

Distribution

https://storage.googleapis.com/opendatabay_public/41933042-6ec7-49c4-b151-508fc8f5592b/7537d999da0b_student_performance_visuals.png" alt="Synthetic student performance data visuals and distribution.png">

Usage

This dataset is ideal for:

Academic Performance Prediction: Predict final grades based on behavioral and background features.

Feature Importance Analysis: Identify key influences on student success.

Sociological Insights: Understand the impact of family, relationship, and lifestyle factors on education.

Model Training: Suitable for classification, regression, and clustering tasks in educational data mining.

Coverage

Captures a comprehensive view of student life, including family background, academic history, health, and lifestyle. The dataset supports multi-disciplinary research across education, sociology, and data science.

License

CC0 (Public Domain)

Who Can Use It

Educational Researchers: For testing interventions and identifying risk factors.

Data Scientists and ML Practitioners: For building predictive models in education.

Instructors and Students: For coursework in data analysis, machine learning, and statistics.
f
Attribute information for student performance data set.
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yinqiu Song; Xianqiu Meng; Jianhua Jiang (2023). Attribute information for student performance data set. [Dataset]. http://doi.org/10.1371/journal.pone.0276943.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0276943.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS ONE
Authors
Yinqiu Song; Xianqiu Meng; Jianhua Jiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Attribute information for student performance data set.
t
Student Exam Performance
test.dbrepo.tuwien.ac.at
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azan (2025). Student Exam Performance [Dataset]. http://doi.org/10.82556/egd2-q295
Explore at:
Unique identifier
https://doi.org/10.82556/egd2-q295
Dataset updated
Jun 15, 2025
Authors
Azan
Time period covered
2025
Description
This dataset is part of a student exam performance prediction use case and contains structured data such as study hours, attendance percentage, previous exam scores, and final exam scores. The data has been split into subsets (training, validation, and test) for use in a machine learning workflow. Each subset is used for a specific phase of model development: training, tuning, and evaluation. The dataset supports a regression-based model and follows FAIR data principles.
College Placement Predictor Dataset
kaggle.com
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SameerProgrammer (2023). College Placement Predictor Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7298157
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7298157
Dataset updated
Dec 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SameerProgrammer
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
1. About the Dataset:

Description: Dive into the world of college placements with this dataset designed to unravel the factors influencing student placement outcomes. The dataset comprises crucial parameters such as IQ scores, CGPA (Cumulative Grade Point Average), and placement status. Aspiring data scientists, researchers, and enthusiasts can leverage this dataset to uncover patterns and insights that contribute to a deeper understanding of successful college placements.

2. Projects Ideas:

Project Idea 1: Predictive Modeling for College Placements Utilize machine learning algorithms to build a predictive model that forecasts a student's likelihood of placement based on their IQ scores and CGPA. Evaluate and compare the effectiveness of different algorithms to enhance prediction accuracy.

Project Idea 2: Feature Importance Analysis Conduct a feature importance analysis to identify the key factors that significantly influence placement outcomes. Gain insights into whether IQ, CGPA, or a combination of both plays a more dominant role in determining success.

Project Idea 3: Clustering Analysis of Placement Trends Apply clustering techniques to group students based on their placement outcomes. Explore whether distinct clusters emerge, shedding light on common characteristics or trends among students who secure placements.

Project Idea 4: Correlation Analysis with External Factors Investigate the correlation between the provided data (IQ, CGPA, placement) and external factors such as internship experience, extracurricular activities, or industry demand. Assess how these external factors may complement or influence placement success.

Project Idea 5: Visualization of Placement Dynamics Over Time Create dynamic visualizations to illustrate how placement trends evolve over time. Analyze trends, patterns, and fluctuations in placement rates to identify potential cyclical or seasonal influences on student placements.

3. Columns Explanation:

IQ:

Definition: Intelligence Quotient, a measure of a person's intellectual abilities.

Data Type: Numeric

Range: Typically, IQ scores range from 70 to 130, with 100 being the average.

CGPA:

Definition: Cumulative Grade Point Average, a measure of a student's overall academic performance.

Data Type: Numeric

Range: Typically, CGPA is on a scale of 0 to 4, with 4 being the highest possible score.

Placement:

Definition: Binary variable indicating whether a student secured a placement (1) or not (0).

Data Type: Categorical (Binary)

Values: 1 (Placement secured) or 0 (No placement).

These columns collectively provide a comprehensive snapshot of a student's intellectual abilities, academic performance, and their success in securing a placement. Analyzing this dataset can offer valuable insights into the dynamics of college placements and inform strategies for optimizing student outcomes.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 27, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Data-Science Sean

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Clear search

Close search

Google apps

Main menu

Student Performance Data Set

Performance of models using CNN features.

Student Performance Prediction

Dataset

Contents

Student Performance and Engagement Prediction in eLearning Environments...

Synthetic Student Performance Dataset

Dataset Features:

Distribution:

Usage:

Coverage:

License:

Who can use it:

‘ Predicting Student Performance’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Student Performance Predictions

student-performance-data

Student Performance Dataset

Students performance prediction data set - traditional vs. online learning

Student Performance Predictions

Dataset

Contents

Data from: Student grade Prediction

VN Student Performance Dataset

Student Performance

Context

Content

Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:

these grades are related with the course subject, Math or Portuguese:

Source

Citation Request:

Student Performance (Multiple Linear Regression) Dataset

Data from: Dataset of Student Level Prediction in UAE

Synthetic Student Profiles with Academic Outcomes Dataset

Dataset Features

Distribution

Usage

Coverage

License

Who Can Use It

Attribute information for student performance data set.

Student Exam Performance

College Placement Predictor Dataset

1. About the Dataset:

2. Projects Ideas:

3. Columns Explanation:

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.