Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Student Performance Data was obtained in a survey of students' math course in secondary school. It consists of 33 Column Dataset Contains Features like - school ID - gender - age - size of family - Father education - Mother education - Occupation of Father and Mother - Family Relation - Health - Grades
This dataset can be used for Regression (as target variable Grade) as well as Analysis tasks. it might contain imbalanced category features.
Please do Upvote if this Dataset really helpful to you.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The dataset includes: Roll Number: Represent the roll number of the student.
Gender: Useful for analyzing performance differences between male and female students.
Race/Ethnicity: Allows analysis of academic performance trends across different racial or ethnic groups.
Parental Level of Education: Indicates the educational background of the student's family.
Lunch: Shows whether students receive a free or reduced lunch, which is often a socioeconomic indicator.
Test Preparation Course: This tells whether students completed a test prep course, which could impact their performance.
Math Score: Provides a measure of each student’s performance in math, used to calculate averages or trends across various demographics. Science Score: Evaluates students' Science knowledge, which can be analyzed to assess overall scentific knowledge of the student.
Reading Score: Measures performance in reading, allowing for insights into literacy and comprehension levels among students.
Writing Score: Evaluates students' writing skills, which can be analyzed to assess overall literacy and expression.
Total Score: Shows the total number achieved by the student out of 400.
Grade: Gade achieved by the student. "A" grade if Total marks >= 320, "B" grade if Total marks >= 250, "C" grade if Total marks >= 200, "D" grade if Total marks >= 150 and Fail if <150.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset used in this study integrates quantitative data on student learning behaviors, engagement patterns, demographics, and academic performance. It was compiled by merging two publicly available Kaggle datasets, resulting in a combined file (“merged_dataset.csv”) containing 14,003 student records with 16 attributes. All records are anonymized and contain no personally identifiable information.
The dataset covers the following categories of variables:
Resource access and learning environment: Resources, Internet, EduTech
Motivation and psychological factors: Motivation, StressLevel
Demographic information: Gender, Age (ranging from 18 to 30 years)
Learning preference classification: LearningStyle
Academic performance indicators: ExamScore, FinalGrade
In this study, “ExamScore” and “FinalGrade” served as the primary performance indicators. The remaining variables were used to derive behavioral and contextual profiles, which were clustered using unsupervised machine learning techniques.
The analysis and modeling were implemented in Python through a structured Jupyter Notebook (“Project.ipynb”), which included the following main steps:
Environment Setup – Import of essential libraries (NumPy, pandas, Matplotlib, Seaborn, SciPy, StatsModels, scikit-learn, imbalanced-learn) and visualization configuration.
Data Import and Integration – Loading the two source CSV files, harmonizing columns, removing irrelevant attributes, aligning formats, handling missing values, and merging them into a unified dataset (merged_dataset.csv).
Data Preprocessing –
Encoding categorical variables using LabelEncoder.
Scaling features using both z-score standardization (for statistical tests and PCA) and Min–Max normalization (for clustering).
Detecting and removing duplicates.
Clustering Analysis –
Applying K-Means clustering to segment learners into distinct profiles.
Determining the optimal number of clusters using the Elbow Method and Silhouette Score.
Evaluating cluster quality with internal metrics (Silhouette Score, Davies–Bouldin Index).
Dimensionality Reduction & Visualization – Using PCA for 2D/3D cluster visualization and feature importance exploration.
Mapping Clusters to Learning Styles – Associating each identified cluster with the most relevant learning style model based on feature patterns and alignment scores.
Statistical Analysis – Conducting ANOVA and regression to test for significant differences in performance between clusters.
Interpretation & Practical Recommendations – Analyzing cluster-specific characteristics and providing implications for adaptive and mobile learning integration.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a detailed, student-level view of academic performance, attendance, and intervention history, enabling educators to monitor progress, identify at-risk students, and tailor support strategies. It includes assessment scores, course and teacher information, intervention records, and attendance tracking, making it ideal for data-driven educational improvement and early warning systems.
Facebook
TwitterThis dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.
Columns: 01. Student_ID: Unique identifier for each student. 02. First_Name: Student’s first name. 03. Last_Name: Student’s last name. 04. Email: Contact email (can be anonymized). 05. Gender: Male, Female, Other. 06. Age: The age of the student. 07. Department: Student's department (e.g., CS, Engineering, Business). 08. Attendance (%): Attendance percentage (0-100%). 09. Midterm_Score: Midterm exam score (out of 100). 10. Final_Score: Final exam score (out of 100). 11. Assignments_Avg: Average score of all assignments (out of 100). 12. Quizzes_Avg: Average quiz scores (out of 100). 13. Participation_Score: Score based on class participation (0-10). 14. Projects_Score: Project evaluation score (out of 100). 15. Total_Score: Weighted sum of all grades. 16. Grade: Letter grade (A, B, C, D, F). 17. Study_Hours_per_Week: Average study hours per week. 18. Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No). 19. Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No). 20. Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD). 21. Family_Income_Level: Low, Medium, High. 22. Stress_Level (1-10): Self-reported stress level (1: Low, 10: High). 23. Sleep_Hours_per_Night: Average hours of sleep per night.
The Attendance is not part of the Total_Score or has very minimal weight.
Calculating the weighted sum: Total Score=a⋅Midterm+b⋅Final+c⋅Assignments+d⋅Quizzes+e⋅Participation+f⋅Projects
| Component | Weight (%) |
|---|---|
| Midterm | 15% |
| Final | 25% |
| Assignments Avg | 15% |
| Quizzes Avg | 10% |
| Participation | 5% |
| Projects Score | 30% |
| Total | 100% |
Dataset contains: - Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level). - Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades). - Imbalanced distributions (e.g., some departments having more students).
Note: - The dataset is real, but I included some bias to create a greater challenge for my students. - Some Columns have been masked as the Data owner requested. "Students_Grading_Dataset_Biased.csv" contains the biased Dataset "Students Performance Dataset" Contains the masked dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information about students' academic performance, study habits, and external factors affecting their final exam scores. It is designed for predictive modeling, data visualization, and educational analytics.
This dataset is useful for:
- Predicting student final exam scores 📈
- Identifying key factors that impact academic performance 🎯
- Exploring feature importance in education-related datasets 📊
- Building machine learning models for regression and classification 🤖
| Column Name | Description |
|---|---|
| Student_ID | Unique identifier for each student. |
| Gender | Gender of the student (Male/Female). |
| Study_Hours_per_Week | Average number of study hours per week. |
| Attendance_Rate | Attendance percentage (50% - 100%). |
| Past_Exam_Scores | Average score of previous exams (50 - 100). |
| Parental_Education_Level | Education level of parents (High School, Bachelors, Masters, PhD). |
| Internet_Access_at_Home | Whether the student has internet access at home (Yes/No). |
| Extracurricular_Activities | Whether the student participates in extracurricular activities (Yes/No). |
| Final_Exam_Score (Target) | The final exam score of the student (50 - 100, integer values). |
| Pass_Fail (Target) | The student status (Pass/Fail). |
This dataset is open for public use. Feel free to use it for learning, research, and model-building! 🚀
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.
2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Academic achievement is an important index to measure the quality of education and students’ learning outcomes. Reasonable and accurate prediction of academic achievement can help improve teachers’ educational methods. And it also provides corresponding data support for the formulation of education policies. However, traditional methods for classifying academic performance have many problems, such as low accuracy, limited ability to handle nonlinear relationships, and poor handling of data sparsity. Based on this, our study analyzes various characteristics of students, including personal information, academic performance, attendance rate, family background, extracurricular activities and etc. Our work offers a comprehensive view to understand the various factors affecting students’ academic performance. In order to improve the accuracy and robustness of student performance classification, we adopted Gaussian Distribution based Data Augmentation technique (GDO), combined with multiple Deep Learning (DL) and Machine Learning (ML) models. We explored the application of different Machine Learning and Deep Learning models in classifying student grades. And different feature combinations and data augmentation techniques were used to evaluate the performance of multiple models in classification tasks. In addition, we also checked the synthetic data’s effectiveness with variance homogeneity and P-values, and studied how the oversampling rate affects actual classification results. Research has shown that the RBFN model based on educational habit features performs the best after using GDO data augmentation. The accuracy rate is 94.12%, and the F1 score is 94.46%. These results provide valuable references for the classification of student grades and the development of intervention strategies. New methods and perspectives in the field of educational data analysis are proposed in our study. At the same time, it has also promoted innovation and development in the intelligence of the education system.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.
2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.
(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
Facebook
TwitterBy UCI [source]
This dataset provides an intimate look into student performance and engagement. It grants researchers access to numerous salient metrics of academic performance which illuminate a broad spectrum of student behaviors: how students interact with online learning material; quantitative indicators reflecting their academic outcomes; as well as demographic data such as age group, gender, prior education level among others.
The main objective of this dataset is to enable analysts and educators alike with empirical insights underpinning individualized learning experiences - specifically in identifying cases when students may be 'at risk'. Given that preventive early interventions have been shown to significantly mitigate chances of course or program withdrawal among struggling students - having accurate predictive measures such as this can greatly steer pedagogical strategies towards being more success oriented.
One unique feature about this dataset is its intricate detailing. Not only does it provide overarching summaries on a per-student basis for each presented courses but it also furnishes data related to assessments (scores & submission dates) along with information on individuals' interactions within VLEs (virtual learning environments) - spanning different types like forums, content pages etc... Such comprehensive collation across multiple contextual layers helps paint an encompassing portrayal of student experience that can guide better instructional design.
Due credit must be given when utilizing this database for research purposes through citation. Specifically referencing (Kuzilek et al., 2015) OU Analyse: Analysing At-Risk Students at The Open University published in Learning Analytics Review is required due to its seminal work related groundings regarding analysis methodologies stem from there.
Immaterial aspects aside - it is important to note that protection of student privacy is paramount within this dataset's terms and conditions. Stringent anonymization techniques have been implemented across sensitive variables - while detailed, profiles can't be traced back to original respondents.
How To Use This Dataset:
Understanding Your Objectives: Ideal objectives for using this dataset could be to identify at-risk students before they drop out of a class or program, improving course design by analyzing how assignments contribute to final grades, or simply examining relationships between different variables and student performance.
Set up your Analytical Environment: Before starting any analysis make sure you have an analytical environment set up where you can load the CSV files included in this dataset. You can use Python notebooks (Jupyter), R Studio or Tableau based software in case you want visual representation as well.
Explore Data Individually: There are seven separate datasets available: Assessments; Courses; Student Assessment; Student Info; Vle (Virtual Learning Environment); Student Registeration and Student Vle. Load these CSVs separately into your environment and do an initial exploration of each one: find out what kind of data they contain (numerical/categorical), if they have missing values etc.
Merge Datasets As the core idea is to track a student’s journey through multiple courses over time, combining these datasets will provide insights from wider perspectives. One way could be merging them using common key columns such as 'code_module', 'code_presentation', & 'id_student'. But make sure that merge should depend on what question you're trying to answer.
Identify Key Metrics Your key metrics will depend on your objectives but might include: overall grade averages per course or assessment type/student/region/gender/age group etc., number of clicks in virtual learning environment, student registration status etc.
Run Your Analysis Now you can run queries to analyze the data relevant to your objectives. Try questions like: What factors most strongly predict whether a student will fail an assessment? or How does course difficulty or the number of allotments per week change students' scores?
Visualization: Visualizing your data can be crucial for understanding patterns and relationships between variables. Use graphs like bar plots, heatmaps, and histograms to represent different aspects of your analyses.
Actionable Insights: The final step is interpreting these results in ways that are meaningf...
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 2.29(USD Billion) |
| MARKET SIZE 2025 | 2.49(USD Billion) |
| MARKET SIZE 2035 | 5.8(USD Billion) |
| SEGMENTS COVERED | Application, Deployment Model, End User, Functionality, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | growing data-driven decision-making, increasing demand for personalized learning, enhancement of student engagement, rising need for real-time analytics, integration of AI technologies |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | Tableau Software, MicroStrategy, Microsoft, Alteryx, Sisense, Oracle, Domo, Google, SAP, SAS Institute, Zoho, Qlik, Looker, TIBCO Software, IBM |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Increased adoption of online learning, Integration with AI technologies, Demand for personalized learning experiences, Growing need for data-driven decision making, Emergence of interactive educational tools |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 8.8% (2025 - 2035) |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Student Performance Dataset originates from the UCI Machine Learning Repository and contains detailed information on secondary school students’ academic performance in Portugal. It includes 395 records with 33 attributes covering demographics, family background, studytime, absences, and grades in three periods (G1, G2, G3).
This dataset is widely used for educational research and predictive modeling. It allows analysts to explore the impact of factors such as study habits, parental education, family support, and absenteeism on students’ final outcomes.
Key Features:
Demographics: age, gender, school
Family background: parental education, family support, guardian type
Academic behavior: study time, absences, past grades
Performance target: final grade (G3, ranging from 0–20)
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Discover the booming Visual Analytics in Education market! Learn about its $2.5 billion (2025) size, 15% CAGR, key drivers, and top players like Oracle & Tableau. Explore market trends and projections to 2033 for informed strategic decisions in EdTech.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed engagement metrics for students enrolled in online courses, including time spent, module completion, quiz scores, discussion activity, and assignment submissions. It enables educators and edtech platforms to analyze learning behaviors, optimize course design, and deliver personalized recommendations to improve student outcomes.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a comprehensive record of student assessment scores across subjects, grade levels, and curriculum standards, enabling educators to analyze performance trends and identify areas for curriculum improvement. It includes detailed information on students, teachers, schools, and assessment types, supporting longitudinal tracking and actionable insights for educational stakeholders.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIn recent years, the application of machine learning (ML) to predict student performance in engineering education has expanded significantly, yet questions remain about the consistency, reliability, and generalisability of these predictive models.ObjectiveThis rapid review aimed to systematically examine peer-reviewed studies published between January 1, 2019, and December 31, 2024, that applied machine learning (ML), artificial intelligence (AI), or deep learning (DL) methods to predict or improve academic outcomes in university engineering programs.MethodsWe searched IEEE Xplore, SpringerLink, and PubMed, identifying an initial pool of 2,933 records. After screening for eligibility based on pre-defined inclusion criteria, we selected 27 peer-reviewed studies for narrative synthesis and assessed their methodological quality using the PROBAST framework.ResultsAll 27 studies involved undergraduate engineering students and demonstrated the capability of diverse ML techniques to enhance various academic outcomes. Notably, one study found that a reinforcement learning-based intelligent tutoring system significantly improved learning efficiency in digital logic courses. Another study using AI-based real-time behavior analysis increased students’ exam scores by approximately 8.44 percentage points. An optimised support vector machine (SVM) model accurately predicted engineering students’ employability with 87.8% accuracy, outperforming traditional predictive approaches. Additionally, a longitudinally validated SVM model effectively identified at-risk students, achieving 83.9% accuracy on hold-out cohorts. Bayesian regression methods also improved early-term course grade prediction by 27% over baseline predictors. However, most studies relied on single-institution samples and lacked rigorous external validation, limiting the generalisability of their findings.ConclusionThe evidence confirms that ML methods—particularly reinforcement learning, deep learning, and optimised predictive algorithms—can substantially improve student performance and academic outcomes in engineering education. However, methodological shortcomings related to participant selection bias, sample sizes, validation practices, and transparency in reporting require further attention. Future research should prioritise multi-institutional studies, robust validation techniques, and enhanced methodological transparency to fully leverage ML’s potential in engineering education.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
his dataset contains 478 records and 35 features, focusing on the relationship between students' psychological traits, learning behaviors, and academic achievements. It includes basic information such as Gender, Age, Academic Discipline, Year of Study, learning-related metrics like Study Hours, Learning Resources, and psychological-academic indicators including Conscientiousness, Openness, Intrinsic/Extrinsic Motivation, CGPA, Course Grades, Professional Skills. It supports research on educational psychology, student development, and academic performance analysis.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Explore the dynamic Education and Learning Analytics Software and Services market forecast (2025-2033). Discover key insights, market drivers, and growth trends shaping the future of education technology.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains detailed logs of student quiz attempts in online learning environments, including scores, time spent, hint usage, device type, and adaptive assessment levels. It enables comprehensive analysis of student performance, engagement patterns, and the effectiveness of adaptive learning strategies for edtech platforms.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).