Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a detailed, student-level view of academic performance, attendance, and intervention history, enabling educators to monitor progress, identify at-risk students, and tailor support strategies. It includes assessment scores, course and teacher information, intervention records, and attendance tracking, making it ideal for data-driven educational improvement and early warning systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been collected to support research on predicting the academic performance of Secondary School Certificate (SSC) and Higher Secondary Certificate (HSC) students in Bangladesh. It comprises responses from many students across various institutions in the country.
The dataset includes a diverse set of features that are believed to influence academic outcomes. These features cover a wide range of domains such as:
Demographic Information: Age, gender, parental education, and occupation.
Academic History: Previous grades, subject preferences, study time, tutoring, etc.
Socioeconomic Factors: Family income, number of siblings, living location (urban/rural).
Institutional Factors: Type of school/college (public/private), distance from home, teacher-student ratio, etc.
Lifestyle and Behavioral Aspects: Sleep habits, screen time, daily routines, mental health indicators, and parental support.
The dataset is labeled with the actual academic performance (grades or GPA) of students in SSC and HSC examinations. The goal is to facilitate the development of predictive models and interpretability studies, with a focus on early intervention and academic counseling.
The dataset is anonymized and free from personally identifiable information. It is intended for academic research, education policy analysis, and machine learning experimentation.
if you use the dataset, please cite "A. A. Maruf, R. Ara Rumy, R. I. Sony and Z. Aung, "Predictive Analysis of Bangladeshi Students’ Academic Performances Using Ensemble Machine Learning with Explainable AI Techniques," 2024 27th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2024, pp. 1200-1205, doi: 10.1109/ICCIT64611.2024.11021990."
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Student Performance Data
This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.
The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.
Highlights:
Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects
File Format:
Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record
Applications
Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset used in this study integrates quantitative data on student learning behaviors, engagement patterns, demographics, and academic performance. It was compiled by merging two publicly available Kaggle datasets, resulting in a combined file (“merged_dataset.csv”) containing 14,003 student records with 16 attributes. All records are anonymized and contain no personally identifiable information.
The dataset covers the following categories of variables:
Resource access and learning environment: Resources, Internet, EduTech
Motivation and psychological factors: Motivation, StressLevel
Demographic information: Gender, Age (ranging from 18 to 30 years)
Learning preference classification: LearningStyle
Academic performance indicators: ExamScore, FinalGrade
In this study, “ExamScore” and “FinalGrade” served as the primary performance indicators. The remaining variables were used to derive behavioral and contextual profiles, which were clustered using unsupervised machine learning techniques.
The analysis and modeling were implemented in Python through a structured Jupyter Notebook (“Project.ipynb”), which included the following main steps:
Environment Setup – Import of essential libraries (NumPy, pandas, Matplotlib, Seaborn, SciPy, StatsModels, scikit-learn, imbalanced-learn) and visualization configuration.
Data Import and Integration – Loading the two source CSV files, harmonizing columns, removing irrelevant attributes, aligning formats, handling missing values, and merging them into a unified dataset (merged_dataset.csv).
Data Preprocessing –
Encoding categorical variables using LabelEncoder.
Scaling features using both z-score standardization (for statistical tests and PCA) and Min–Max normalization (for clustering).
Detecting and removing duplicates.
Clustering Analysis –
Applying K-Means clustering to segment learners into distinct profiles.
Determining the optimal number of clusters using the Elbow Method and Silhouette Score.
Evaluating cluster quality with internal metrics (Silhouette Score, Davies–Bouldin Index).
Dimensionality Reduction & Visualization – Using PCA for 2D/3D cluster visualization and feature importance exploration.
Mapping Clusters to Learning Styles – Associating each identified cluster with the most relevant learning style model based on feature patterns and alignment scores.
Statistical Analysis – Conducting ANOVA and regression to test for significant differences in performance between clusters.
Interpretation & Practical Recommendations – Analyzing cluster-specific characteristics and providing implications for adaptive and mobile learning integration.
Facebook
TwitterDescription This dataset provides a detailed snapshot of high school students' performance in exams, focusing on their scores in mathematics, reading, and writing. It includes essential demographic, social, and academic variables that are known to influence academic outcomes. The dataset consists of 1,000 observations, where each row represents a unique student, and includes various attributes such as gender, race/ethnicity, parental education levels, test preparation status, lunch type, and scores in three key academic subjects. This dataset can be leveraged to analyze trends, correlations, and disparities in academic performance based on socioeconomic and educational factors.
| Attribute | Description |
|---|---|
| Gender | This column categorizes students by their gender (Male, Female). Allows for the exploration of gender-based performance trends in math, reading, and writing scores. |
| Race/Ethnicity | Coded into five groups (Group A to Group E), this feature represents the racial or ethnic background of the student. Enables analysis of how ethnic backgrounds influence exam performance. |
| Parental Level of Education | Describes the highest educational attainment of the student’s parents (e.g., High School, Some College, Associate’s Degree, Bachelor’s Degree, Master’s Degree). This variable is useful in understanding the impact of parental education on students' academic achievements. |
| Lunch Type | Indicates whether the student receives a standard lunch or a free/reduced-price lunch. This feature can be used to study the relationship between socioeconomic status and academic performance. |
| Test Preparation Course | Describes whether the student completed a test preparation course (Completed or None). Examines the influence of structured test preparation on academic outcomes. |
| Math Score | This column records the student’s performance in mathematics (on a scale of 0-100). A key outcome variable for assessing performance in a core subject. |
| Reading Score | Similar to the math score, this feature captures the student’s performance in reading (on a scale of 0-100). Provides insight into students' literacy and comprehension abilities. |
| Writing Score | Represents the student’s performance in writing (on a scale of 0-100). Allows for analysis of written communication skills and overall language proficiency. |
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.
2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.
(2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains detailed information about 70+ students, focusing on predicting their 5th semester performance based on their performances in previous semesters. It encompasses various aspects such as demographics, educational data, predicting variables, and environmental factors, making it ideal for educational research and predictive modeling.
The dataset is divided into the following sections:
Environmental Factors Covers environmental influences, study habits, personal preferences, and other factors that could affect students' education and performance.
This dataset can be used for:
Predictive modeling of 5th semester student performance
Analyzing the impact of past academic performance on future outcomes
Educational research to identify key factors influencing academic success
Please credit the original source of the data if you use this dataset in your research or project.
Use this dataset to build models predicting student performance in the 5th semester based on their performance in previous semesters, explore correlations between various factors and academic success, or conduct comprehensive educational analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a comprehensive summary of student quiz performance across multiple subjects, capturing scores, completion status, timing, and indicators for targeted tutoring and curriculum improvement. It enables educators and administrators to identify struggling students, analyze subject-level trends, and make data-driven decisions to enhance learning outcomes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Academic achievement is an important index to measure the quality of education and students’ learning outcomes. Reasonable and accurate prediction of academic achievement can help improve teachers’ educational methods. And it also provides corresponding data support for the formulation of education policies. However, traditional methods for classifying academic performance have many problems, such as low accuracy, limited ability to handle nonlinear relationships, and poor handling of data sparsity. Based on this, our study analyzes various characteristics of students, including personal information, academic performance, attendance rate, family background, extracurricular activities and etc. Our work offers a comprehensive view to understand the various factors affecting students’ academic performance. In order to improve the accuracy and robustness of student performance classification, we adopted Gaussian Distribution based Data Augmentation technique (GDO), combined with multiple Deep Learning (DL) and Machine Learning (ML) models. We explored the application of different Machine Learning and Deep Learning models in classifying student grades. And different feature combinations and data augmentation techniques were used to evaluate the performance of multiple models in classification tasks. In addition, we also checked the synthetic data’s effectiveness with variance homogeneity and P-values, and studied how the oversampling rate affects actual classification results. Research has shown that the RBFN model based on educational habit features performs the best after using GDO data augmentation. The accuracy rate is 94.12%, and the F1 score is 94.46%. These results provide valuable references for the classification of student grades and the development of intervention strategies. New methods and perspectives in the field of educational data analysis are proposed in our study. At the same time, it has also promoted innovation and development in the intelligence of the education system.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.
2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The early identification of students facing learning difficulties is one of the most critical challenges in modern education. Intervening effectively requires leveraging data to understand the complex interplay between student demographics, engagement patterns, and academic performance.
This dataset was created to serve as a high-quality, pre-processed resource for building machine learning models to tackle this very problem. It is a unique hybrid dataset, meticulously crafted by unifying three distinct sources:
The Open University Learning Analytics Dataset (OULAD): A rich dataset detailing student interactions with a Virtual Learning Environment (VLE). We have aggregated the raw, granular data (over 10 million interaction logs) into powerful features, such as total clicks, average assessment scores, and distinct days of activity for each student registration.
The UCI Student Performance Dataset: A classic educational dataset containing demographic information and final grades in Portuguese and Math subjects from two Portuguese schools.
A Synthetic Data Component: A synthetically generated portion of the data, created to balance the dataset or represent specific student profiles.
A direct merge of these sources was not possible as the student identifiers were not shared. Instead, a strategy of intelligent concatenation was employed. The final dataset has undergone a rigorous pre-processing pipeline to make it immediately usable for machine learning tasks:
Advanced Imputation: Missing values were handled using a sophisticated iterative imputation method powered by Gaussian Mixture Models (GMM), ensuring the dataset's integrity.
One-Hot Encoding: All categorical features have been converted to a numerical format.
Feature Scaling: All numerical features have been standardized (using StandardScaler) to have a mean of 0 and a standard deviation of 1, preventing model bias from features with different scales.
The result is a clean, comprehensive dataset ready for modeling.
Each row represents a student profile, and the columns are the features and the target.
Features include aggregated online engagement metrics (e.g., clicks, distinct activities), academic performance (grades, scores), and student demographics (e.g., gender, age band). A key feature indicates the original data source (OULAD, UCI, Synthetic).
The dataset contains no Personally Identifiable Information (PII). Demographic information is presented in broad, anonymized categories.
Key Columns:
Target Variable:
had_difficulty: The primary target for classification. This binary variable has been engineered from the original final_result column of the OULAD dataset.
1: The student either failed (Fail) or withdrew (Withdrawn) from the course.
0: The student passed (Pass or Distinction).
Feature Groups:
OULAD Aggregated Features (e.g., oulad_total_cliques, oulad_media_notas): Quantitative metrics summarizing a student's engagement and performance within the VLE.
Academic Performance Features (e.g., nota_matematica_harmonizada): Harmonized grades from different data sources.
Demographic Features (e.g., gender_*, age_band_*): One-hot encoded columns representing student demographics.
Origin Features (e.g., origem_dado_OULAD, origem_dado_UCI): One-hot encoded columns indicating the original source of the data for each row. This allows for source-specific analysis.
(Note: All numerical feature names are post-scaling and may not directly reflect their original names. Please refer to the complete column list for details.)
This dataset would not be possible without the original data providers. Please acknowledge them in any work that uses this data:
OULAD Dataset: Kuzilek, J., Hlosta, M., and Zdrahal, Z. (2017). Open University Learning Analytics dataset. Scientific Data, 4. https://analyse.kmi.open.ac.uk/open_dataset
UCI Student Performance Dataset: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS. https://archive.ics.uci.edu/ml/datasets/student+performance
This dataset is perfect for a variety of predictive modeling tasks. Here are a few ideas to get you started:
Can you build a classification model to predict had_difficulty with high recall? (Minimizing the number of at-risk students we fail to identify).
Which features are the most powerful predictors of student failure or withdrawal? (Feature Importance Analysis).
Can you build separate models for each data origin (origem_dado_*) and compare ...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a comprehensive record of student assessment scores across subjects, grade levels, and curriculum standards, enabling educators to analyze performance trends and identify areas for curriculum improvement. It includes detailed information on students, teachers, schools, and assessment types, supporting longitudinal tracking and actionable insights for educational stakeholders.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global market for Visual Analytics in Education is experiencing robust growth, driven by the increasing adoption of data-driven decision-making in educational institutions and the need for effective data visualization tools to analyze complex student performance data. While precise market figures are unavailable, let's assume, based on industry trends and the presence of major players like Oracle and Tableau, a 2025 market size of approximately $2.5 billion is reasonable. Considering a projected Compound Annual Growth Rate (CAGR) of 15% over the forecast period (2025-2033), the market is expected to reach approximately $8.5 billion by 2033. This significant growth is fueled by several factors, including the expanding availability of educational data, the rising demand for personalized learning experiences, and the increasing investment in educational technology infrastructure. Furthermore, the integration of visual analytics tools into learning management systems (LMS) and the development of user-friendly interfaces tailored for educators are significantly contributing to market expansion. The market is segmented by various factors, including deployment model (cloud-based and on-premise), component (software, services), and application (student performance analysis, resource allocation, curriculum development). Key restraints include the high initial cost of implementation, the need for specialized training for educators, and concerns about data privacy and security. However, the long-term benefits of improved learning outcomes and efficient resource management are expected to outweigh these challenges. Leading vendors are continually innovating to offer robust, scalable, and affordable solutions, further driving market penetration. The competitive landscape features established players alongside emerging startups, resulting in a dynamic and evolving market with significant opportunities for both established companies and newcomers.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this research, we have generated student retention alerts. The alerts are classified into two types: preventive and corrective. This classification varies according to the level of maturity of the data systematization process. Therefore, to systematize the data, data mining techniques have been applied. The experimental analytical method has been used, with a population of 13,715 students with 62 sociological, academic, family, personal, economic, psychological, and institutional variables, and factors such as academic follow-up and performance, financial situation, and personal information. In particular, information is collected on each of the problems or a combination of problems that could affect dropout rates. Following the methodology, the information has been generated through an abstract data model to reflect the profile of the dropout student. As advancement from previous research, this proposal will create preventive and corrective alternatives to avoid dropout higher education. Also, in contrast to previous work, we generated corrective warnings with the application of data mining techniques such as neural networks until reaching a precision of 97% and losses of 0.1052. In conclusion, this study pretends to analyze the behavior of students who drop out the university through the evaluation of predictive patterns. The overall objective is to predict the profile of student dropout, considering reasons such as admission to higher education and career changes. Consequently, using a data systematization process promotes the permanence of students in higher education. Once the profile of the dropout has been identified, student retention strategies have been approached, according to the time of its appearance and the point of view of the institution.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Detailed Dataset Description
This dataset is a comprehensive collection of [type of data, e.g., financial, medical, demographic, or environmental] information spanning the period from [start date] to [end date]. It is designed to provide insights into [main purpose of the dataset, e.g., market trends, patient behavior, climate patterns, consumer habits] and is suitable for advanced analysis, predictive modeling, and visualization.
Contents
Number of Records: [total rows]
Number of Features/Columns: [total columns]
Feature Types: Includes both numeric and categorical features, such as [examples of numeric features], [examples of categorical features], and [any special types, e.g., time series, text data].
Target Variable (if applicable): [Name of the target variable, e.g., “Gold Price”, “Insurance Charges”, “Customer Purchase Amount”]
Missing Values & Data Quality: [Brief note about missing values, anomalies, or cleaning required]
Context This dataset captures [real-world context], enabling users to explore patterns, correlations, and trends. Analysts and data scientists can use it for tasks such as:
Statistical analysis and reporting
Machine learning modeling and predictive analytics
Trend visualization and forecasting
Correlation and causal relationship studies
Additional Notes
The data is collected from [sources, e.g., public records, surveys, APIs, financial exchanges].
It has undergone [brief description of preprocessing, cleaning, or normalization if any].
The dataset is intended for [specific audience, e.g., researchers, data analysts, business strategists] and can be integrated into [specific tools or platforms if relevant].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIn recent years, the application of machine learning (ML) to predict student performance in engineering education has expanded significantly, yet questions remain about the consistency, reliability, and generalisability of these predictive models.ObjectiveThis rapid review aimed to systematically examine peer-reviewed studies published between January 1, 2019, and December 31, 2024, that applied machine learning (ML), artificial intelligence (AI), or deep learning (DL) methods to predict or improve academic outcomes in university engineering programs.MethodsWe searched IEEE Xplore, SpringerLink, and PubMed, identifying an initial pool of 2,933 records. After screening for eligibility based on pre-defined inclusion criteria, we selected 27 peer-reviewed studies for narrative synthesis and assessed their methodological quality using the PROBAST framework.ResultsAll 27 studies involved undergraduate engineering students and demonstrated the capability of diverse ML techniques to enhance various academic outcomes. Notably, one study found that a reinforcement learning-based intelligent tutoring system significantly improved learning efficiency in digital logic courses. Another study using AI-based real-time behavior analysis increased students’ exam scores by approximately 8.44 percentage points. An optimised support vector machine (SVM) model accurately predicted engineering students’ employability with 87.8% accuracy, outperforming traditional predictive approaches. Additionally, a longitudinally validated SVM model effectively identified at-risk students, achieving 83.9% accuracy on hold-out cohorts. Bayesian regression methods also improved early-term course grade prediction by 27% over baseline predictors. However, most studies relied on single-institution samples and lacked rigorous external validation, limiting the generalisability of their findings.ConclusionThe evidence confirms that ML methods—particularly reinforcement learning, deep learning, and optimised predictive algorithms—can substantially improve student performance and academic outcomes in engineering education. However, methodological shortcomings related to participant selection bias, sample sizes, validation practices, and transparency in reporting require further attention. Future research should prioritise multi-institutional studies, robust validation techniques, and enhanced methodological transparency to fully leverage ML’s potential in engineering education.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and do file
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains detailed logs of student quiz attempts in online learning environments, including scores, time spent, hint usage, device type, and adaptive assessment levels. It enables comprehensive analysis of student performance, engagement patterns, and the effectiveness of adaptive learning strategies for edtech platforms.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is related to student data, from an educational research study focusing on student demographics, academic performance, and related factors. Here’s a general description of what each column likely represents:
Sex: The gender of the student (e.g., Male, Female). Age: The age of the student. Name: The name of the student. State: The state where the student resides or where the educational institution is located. Address: Indicates whether the student lives in an urban or rural area. Famsize: Family size category (e.g., LE3 for families with less than or equal to 3 members, GT3 for more than 3). Pstatus: Parental cohabitation status (e.g., 'T' for living together, 'A' for living apart). Medu: Mother's education level (e.g., Graduate, College). Fedu: Father's education level (similar categories to Medu). Mjob: Mother's job type. Fjob: Father's job type. Guardian: The primary guardian of the student. Math_Score: Score obtained by the student in Mathematics. Reading_Score: Score obtained by the student in Reading. Writing_Score: Score obtained by the student in Writing. Attendance_Rate: The percentage rate of the student’s attendance. Suspensions: Number of times the student has been suspended. Expulsions: Number of times the student has been expelled. Teacher_Support: Level of support the student receives from teachers (e.g., Low, Medium, High). Counseling: Indicates whether the student receives counseling services (Yes or No). Social_Worker_Visits: Number of times a social worker has visited the student. Parental_Involvement: The level of parental involvement in the student's academic life (e.g., Low, Medium, High). GPA: The student’s Grade Point Average, a standard measure of academic achievement in schools.
This dataset provides a comprehensive look at various factors that might influence a student's educational outcomes, including demographic factors, academic performance metrics, and support structures both at home and within the educational system. It can be used for statistical analysis to understand and improve student success rates, or for targeted interventions based on specific identified needs.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).