100+ datasets found
  1. Student Performance Data Set

    • kaggle.com
    Updated Mar 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  2. f

    Performance of models using CNN features.

    • plos.figshare.com
    xls
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf (2023). Performance of models using CNN features. [Dataset]. http://doi.org/10.1371/journal.pone.0293061.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.

  3. Student Performance Prediction

    • kaggle.com
    Updated Dec 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asiya Jan001 (2021). Student Performance Prediction [Dataset]. https://www.kaggle.com/datasets/asiyajan001/student-performance-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Asiya Jan001
    Description

    Dataset

    This dataset was created by Asiya Jan001

    Released under Data files © Original Authors

    Contents

  4. o

    Student Performance and Engagement Prediction in eLearning Environments...

    • osf.io
    url
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei-Li Lin; Chang Liu; Qi Lin; ZhexuLi (2024). Student Performance and Engagement Prediction in eLearning Environments Using Machine Learning Methods [Dataset]. http://doi.org/10.17605/OSF.IO/WKUS3
    Explore at:
    urlAvailable download formats
    Dataset updated
    Oct 10, 2024
    Dataset provided by
    Center For Open Science
    Authors
    Wei-Li Lin; Chang Liu; Qi Lin; ZhexuLi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset was collected from a second-year undergraduate Science course at a North American University, delivered in a blended format (combining face-to-face and online components). The raw dataset comprises an event log of 486 enrolled students, totaling 305,933 records from the university’s learning management system (LMS), OWL. Each record includes the following fields:

    • Event Date: Timestamp of the event.
    • Event Type: Action taken by the student.
    • Event Location: Directory where the action was taken.
    • Session Start: Timestamp marking the start of the online session.
    • Session End: Timestamp marking the end of the online session.
    • Student ID: Identifier to group the event log by student.

    Resources and assignments were posted sequentially throughout the course, with an average duration of approximately two weeks between assignment posting and due date. The dataset was sorted first by “Student ID” and then by “Event Date” to maintain a chronological order of events for each student. Due to privacy concerns and the General Data Protection Regulation (GDPR), the raw dataset cannot be shared. Instead, it was transformed into a new dataset representing engagement metrics by calculating desired metrics from the event logs for each student. The engagement metrics were chosen to maximize the information extracted from the available data, considering the course structure, which included three assignments, one quiz, one midterm exam, and one final exam.

  5. o

    Synthetic Student Performance Dataset

    • opendatabay.com
    .undefined
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Student Performance Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/09e2de7b-9830-4337-a801-f4b8ca312c53
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Education & Learning Analytics
    Description

    This synthetic Student Performance Dataset is designed as an educational resource for data science, machine learning, and education analytics applications. The dataset provides detailed information on various factors influencing students’ academic performance, including demographics, family background, extracurricular activities, and study habits. It aims to help users analyze relationships between these factors and students’ grades, providing insights into student success and well-being.

    Dataset Features:

    • Gender: Gender of the student (e.g., "Male," "Female").
    • Age: Age of the student (in years).
    • Family Size: Size of the student’s family.
    • Parental Status (Together/Apart): Whether the parents are living together or apart.
    • Mother's Education Level: Education level of the student’s mother.
    • Father's Education Level: Education level of the student’s father.
    • Mother's Job: Occupation of the student’s mother.
    • Father's Job: Occupation of the student’s father.
    • Reason for Choosing School: Primary reason for selecting the school (e.g., proximity, reputation).
    • Legal Guardian: Legal guardian of the student (e.g., "Mother," "Father," "Other").
    • Travel Time to School (in hours): Daily travel time between home and school.
    • Weekly Study Time (in hours): Hours spent studying outside school per week.
    • Number of Past Failures: Number of previously failed subjects.
    • Extra Educational Support: Whether the student receives additional educational support (e.g., "Yes," "No").
    • Family Educational Support: Whether the family provides educational support (e.g., "Yes," "No").
    • Paid Extra Classes: Whether the student takes extra paid classes (e.g., "Yes," "No").
    • Extracurricular Activities: Participation in extracurricular activities (e.g., "Yes," "No").
    • Attended Nursery School: Whether the student attended nursery school (e.g., "Yes," "No").
    • Aspiration for Higher Education: Whether the student aspires to pursue higher education (e.g., "Yes," "No").
    • Internet Access at Home: Availability of internet access at home (e.g., "Yes," "No").
    • In a Romantic Relationship: Whether the student is in a romantic relationship (e.g., "Yes," "No").
    • Quality of Family Relationships: Rated quality of relationships within the family.
    • Free Time After School: Amount of free time available after school hours.
    • Going Out with Friends: Frequency of going out with friends.
    • Workday Alcohol Consumption: Level of alcohol consumption during workdays.
    • Weekend Alcohol Consumption: Level of alcohol consumption during weekends.
    • Current Health Status: Self-reported health status of the student.
    • Number of School Absences: Total number of school days missed.
    • First Period Grade: Grade received during the first grading period.
    • Second Period Grade: Grade received during the second grading period.
    • Final Grade: Final grade achieved by the student.

    Distribution:

    https://storage.googleapis.com/opendatabay_public/images/image_725529a8-e4cb-4bee-bcca-a9adc2658dbd.png" alt="Student Performance Dataset Distribution">

    https://storage.googleapis.com/opendatabay_public/images/image_55f1fa29-442d-49ea-89a1-e90b85d8c95f.png" alt="Student Performance Data">

    Usage:

    This dataset is useful for a variety of applications, including:

    • Student Performance Analysis: To explore relationships between family background, study habits, and academic outcomes.
    • Educational Research: To identify key factors influencing student success and well-being.
    • Predictive Modeling: To build models that predict student grades or identify students at risk of underperforming.
    • Policy Making: To analyze how socioeconomic factors and family structure impact education outcomes.

    Coverage:

    This dataset is synthetic and anonymized, ensuring that it is safe for experimentation and learning without compromising any real student data.

    License:

    CCO (Public Domain)

    Who can use it:

    Data science learners: For practising data manipulation, visualization, and predictive modelling. Educators and researchers: For academic studies or teaching purposes in student analytics and education research. Education professionals: For analyzing factors that influence student success and tailoring interventions to improve outcomes.

  6. A

    ‘ Predicting Student Performance’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 2, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘ Predicting Student Performance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-student-performance-ec1b/b7296868/?iid=058-803&v=presentation
    Explore at:
    Dataset updated
    Mar 2, 2015
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    • This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

    How to use this dataset

    • Predict Student's future performance
    • Understand the root causes for low performance
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit ewenme

    --- Original source retains full ownership of the source dataset ---

  7. Student Performance Predictions

    • kaggle.com
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haseeb_in_Data (2024). Student Performance Predictions [Dataset]. https://www.kaggle.com/datasets/haseebindata/student-performance-predictions/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    Kaggle
    Authors
    Haseeb_in_Data
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Student Performance Dataset is designed to evaluate and predict student outcomes based on various factors that can influence academic success. This synthetic dataset includes features that are commonly considered in educational research and real-world scenarios, such as attendance, study habits, previous academic performance, and participation in extracurricular activities. The goal is to understand how these factors correlate with the final grades of students and to build a predictive model that can forecast student performance.

    Dataset Features: StudentID: A unique identifier for each student. Name: The name of the student. Gender: The gender of the student (Male/Female). AttendanceRate: The percentage of classes attended by the student. StudyHoursPerWeek: The number of hours the student spends studying each week. PreviousGrade: The grade the student achieved in the previous semester (out of 100). ExtracurricularActivities: The number of extracurricular activities the student is involved in. ParentalSupport: A qualitative assessment of the level of support provided by the student's parents (High/Medium/Low). FinalGrade: The final grade of the student (out of 100), which serves as the target variable for prediction. Use Cases: Predicting Student Performance: The dataset can be used to build machine learning models that predict the final grade of students based on the other features. This can help educators identify students who may need additional support to improve their outcomes.

    Exploratory Data Analysis: Researchers and data scientists can explore the relationships between different factors (like attendance or study habits) and student performance. For example, understanding whether higher attendance correlates with better grades.

    Feature Importance Analysis: The dataset allows for the examination of which features are most predictive of student success, providing insights into key areas of focus for educational interventions.

    Educational Interventions: By identifying patterns in the data, schools and educational institutions can implement targeted interventions to help students improve in specific areas, such as increasing study hours or encouraging participation in extracurricular activities.

    Potential Insights: Correlation Between Study Habits and Performance: The dataset can be used to determine how much study time contributes to academic success.

    Impact of Attendance on Grades: Analysis can reveal the extent to which regular attendance influences final grades.

    Role of Extracurricular Activities: The dataset can help assess whether participation in extracurricular activities positively or negatively impacts academic performance.

    Influence of Parental Support: The data allows for the examination of how different levels of parental support affect student outcomes.

    Conclusion: The Student Performance Dataset is a versatile tool for educators, data scientists, and researchers interested in understanding and predicting student success. By analyzing this data, stakeholders can gain valuable insights into the factors that contribute to academic performance and develop strategies to enhance educational outcomes

  8. student-performance-data

    • kaggle.com
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Azam (2025). student-performance-data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12160820
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Azam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Student Performance Data

    This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.

    The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.

    Highlights:

    Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects

    File Format:

    Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record

    Applications

    Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization

  9. c

    Student Performance Dataset

    • cubig.ai
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance Dataset [Dataset]. https://cubig.ai/store/products/358/student-performance-dataset
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.

    2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.

  10. Students performance prediction data set - traditional vs. online learning

    • figshare.com
    txt
    Updated Mar 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Czibula; Maier Mariana; Zsuzsanna Onet-Marian (2021). Students performance prediction data set - traditional vs. online learning [Dataset]. http://doi.org/10.6084/m9.figshare.14330447.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 28, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Gabriela Czibula; Maier Mariana; Zsuzsanna Onet-Marian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The six data sets were created for an undergraduate course at the Babes-Bolyai University, Faculty of Mathematics and Computer Science, held for second year students in the autumn semester. The course is taught both in Romanian and English with the same content and evaluation rules in both languages. The six data sets are the following: - FirstCaseStudy_RO_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the Romanian language - FirstCaseStudy_RO_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the Romanian language - SecondCaseStudy_EN_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the English language - SecondCaseStudy_EN_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the English language - ThirdCaseStudy_Both_traditional_2019-2020.txt - the concatenation of the two data sets for the 2019-2020 academic year (so all instances from FirstCaseStudy_RO_traditional_2019-2020 and SecondCaseStudy_EN_traditional_2019-2020 together) - ThirdCaseStudy_Both_online_2020-2021.txt - the concatenation of the two data sets for the 2020-2021 academic year (so all instances from FirstCaseStudy_RO_online_2020-2021 and SecondCaseStudy_EN_online_2020-2021 together)Instances from the data sets for the 2019-2020 academic year contain 12 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - the grades received by the student for 2 practical exams. If a student did not participate in a practical exam, de grade was 0. Possible values are between 0 and 10. - the number of seminar activities that the student had. Possible values are between 0 and 7. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4Instances from the data sets for the 2020-2021 academic year contain 10 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - a seminar bonus computed based on the number of seminar activities the student had during the semester, which was added to the final grade. Possible values are between 0 and 0.5. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4

  11. Student Performance Predictions

    • kaggle.com
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youssef Ayman (2024). Student Performance Predictions [Dataset]. https://www.kaggle.com/datasets/youssefayman22/student-performance-predictions/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Youssef Ayman
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Youssef Ayman

    Released under Apache 2.0

    Contents

  12. m

    Data from: Student grade Prediction

    • data.mendeley.com
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neelamcadhab Padhy (2025). Student grade Prediction [Dataset]. http://doi.org/10.17632/6dgkv6kpr2.1
    Explore at:
    Dataset updated
    Mar 24, 2025
    Authors
    Neelamcadhab Padhy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains semester-wise academic performance data of BTech students from GIET University. It includes the grades of students from their 1st to 4th semesters, along with their corresponding 5th-semester grades. The dataset is intended for use in educational data mining and machine learning applications, specifically for predicting the 5th-semester grades of students based on their past performance.The dataset consists of 379 student records, with each record containing the following attributes:

    SEM 1: Grade obtained in the 1st semester.

    SEM 2: Grade obtained in the 2nd semester.

    SEM 3: Grade obtained in the 3rd semester.

    SEM 4: Grade obtained in the 4th semester.

    SEM 5: Grade obtained in the 5th semester (target variable for prediction).The grades are represented on a scale of 0 to 10, where 10 is the highest achievable grade. This dataset can be used to develop predictive models for academic performance, identify trends in student performance, and support decision-making in educational institutions.

    Keywords: Grade Prediction, Student Performance, Educational Data Mining, Academic Analytics, Machine Learning, GIET University

    Potential Applications:

    Predicting student performance in future semesters.

    Identifying at-risk students for early intervention.

    Analyzing trends in academic performance over time.

  13. VN Student Performance Dataset

    • kaggle.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoàng Ngọc Tiến (2025). VN Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/hongngctin/vn-student-performance-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hoàng Ngọc Tiến
    Description

    This is a synthesized dataset based on real academic performance data of high school students in several schools in Vietnam. This data can be useful for analysis, training prediction models on academic performance, personalized study planning, and career counseling, among other applications.

    The data used contains only anonymized and non-identifiable information collected from high school students, including demographic and academic performance attributes. No personally identifying information was collected or used. The data is used exclusively for academic research purposes under ethical guidelines, and no attempt is made to trace or analyze individual-level outcomes.

  14. Student Performance

    • kaggle.com
    Updated Aug 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BARKHA VERMA (2020). Student Performance [Dataset]. https://www.kaggle.com/barkhaverma/student-performance/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 30, 2020
    Dataset provided by
    Kaggle
    Authors
    BARKHA VERMA
    Description

    Context

    This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

    Content

    Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:

    1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira) 2 sex - student's sex (binary: 'F' - female or 'M' - male) 3 age - student's age (numeric: from 15 to 22) 4 address - student's home address type (binary: 'U' - urban or 'R' - rural) 5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) 6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart) 7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) 8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) 9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') 10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') 11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') 12 guardian - student's guardian (nominal: 'mother', 'father' or 'other') 13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) 14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) 15 failures - number of past class failures (numeric: n if 1<=n<3, else 4) 16 schoolsup - extra educational support (binary: yes or no) 17 famsup - family educational support (binary: yes or no) 18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) 19 activities - extra-curricular activities (binary: yes or no) 20 nursery - attended nursery school (binary: yes or no) 21 higher - wants to take higher education (binary: yes or no) 22 internet - Internet access at home (binary: yes or no) 23 romantic - with a romantic relationship (binary: yes or no) 24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) 25 freetime - free time after school (numeric: from 1 - very low to 5 - very high) 26 goout - going out with friends (numeric: from 1 - very low to 5 - very high) 27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high) 28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) 29 health - current health status (numeric: from 1 - very bad to 5 - very good) 30 absences - number of school absences (numeric: from 0 to 93)

    these grades are related with the course subject, Math or Portuguese:

    31 G1 - first period grade (numeric: from 0 to 20) 31 G2 - second period grade (numeric: from 0 to 20) 32 G3 - final grade (numeric: from 0 to 20, output target)

    Source

    Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez

    Citation Request:

    Please include this citation if you plan to use this database:

    P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. [Web Link]

  15. c

    Student Performance (Multiple Linear Regression) Dataset

    • cubig.ai
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance (Multiple Linear Regression) Dataset [Dataset]. https://cubig.ai/store/products/392/student-performance-multiple-linear-regression-dataset
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Student Performance (Multiple Linear Regression) Dataset is designed to analyze the relationship between students’ learning habits and academic performance. Each sample includes key indicators related to learning, such as study hours, sleep duration, previous test scores, and the number of practice exams completed.

    2) Data Utilization (1) Characteristics of the Student Performance (Multiple Linear Regression) Dataset: • The target variable, Hours Studied, quantitatively represents the amount of time a student has invested in studying. The dataset is structured to allow modeling and inference of learning behaviors based on correlations with other variables.

    (2) Applications of the Student Performance (Multiple Linear Regression) Dataset: • AI-Based Study Time Prediction Models: The dataset can be used to develop regression models that estimate a student’s expected study time based on inputs like academic performance, sleep habits, and engagement patterns. • Behavioral Analysis and Personalized Learning Strategies: It can be applied to identify students with insufficient study time and design personalized study interventions based on academic and lifestyle patterns.

  16. m

    Data from: Dataset of Student Level Prediction in UAE

    • data.mendeley.com
    Updated Dec 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shatha Ghareeb (2020). Dataset of Student Level Prediction in UAE [Dataset]. http://doi.org/10.17632/3g8dtwbjjy.1
    Explore at:
    Dataset updated
    Dec 18, 2020
    Authors
    shatha Ghareeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Arab Emirates
    Description

    The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.

  17. o

    Synthetic Student Profiles with Academic Outcomes Dataset

    • opendatabay.com
    .undefined
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Student Profiles with Academic Outcomes Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/41933042-6ec7-49c4-b151-508fc8f5592b
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Education & Learning Analytics
    Description

    The Synthetic Student Performance Dataset is designed to support research, analytics, and educational projects focused on academic performance, family background, and behavioral factors affecting students. It mirrors real-world educational data and offers diverse features to explore student success patterns.

    Dataset Features

    • student_id: Unique identifier for each student.
    • school: Attended school (e.g., GP or MS).
    • sex: Gender of the student (F/M).
    • age: Student's age in years.
    • address_type: Urban or Rural home location.
    • family_size: Family size (Less than or equal to 3 / Greater than 3).
    • parent_status: Parental cohabitation status (Living together / Apart).
    • mother_education / father_education: Highest education level completed (e.g., Primary, Secondary, Higher).
    • mother_job / father_job: Occupation of the student's parents.
    • school_choice_reason: Reason for choosing the school (e.g., Reputation, Proximity).
    • guardian: Primary caregiver (e.g., Mother, Father, Other).
    • travel_time: Daily travel time to school.
    • study_time: Weekly study time outside school.
    • class_failures: Number of past class failures.
    • school_support / family_support: Extra academic support received at school and from family (Yes/No).
    • extra_paid_classes: Attending paid private tutoring (Yes/No).
    • activities: Participation in extracurricular activities (Yes/No).
    • nursery_school: Attended preschool (Yes/No).
    • higher_ed: Desire to pursue higher education (Yes/No).
    • internet_access: Access to the internet at home (Yes/No).
    • romantic_relationship: Currently in a romantic relationship (Yes/No).
    • family_relationship: Quality of family relationships (numeric scale).
    • free_time: Amount of free time after school (numeric scale).
    • social: Frequency of social activities with peers (numeric scale).
    • weekday_alcohol / weekend_alcohol: Alcohol consumption levels on weekdays and weekends.
    • health: Current health status (1–5 scale).
    • absences: Number of school absences.
    • grade_1 / grade_2 / final_grade: First and second period grades and final academic performance.

    Distribution

    https://storage.googleapis.com/opendatabay_public/41933042-6ec7-49c4-b151-508fc8f5592b/7537d999da0b_student_performance_visuals.png" alt="Synthetic student performance data visuals and distribution.png">

    Usage

    This dataset is ideal for:

    • Academic Performance Prediction: Predict final grades based on behavioral and background features.
    • Feature Importance Analysis: Identify key influences on student success.
    • Sociological Insights: Understand the impact of family, relationship, and lifestyle factors on education.
    • Model Training: Suitable for classification, regression, and clustering tasks in educational data mining.

    Coverage

    Captures a comprehensive view of student life, including family background, academic history, health, and lifestyle. The dataset supports multi-disciplinary research across education, sociology, and data science.

    License

    CC0 (Public Domain)

    Who Can Use It

    • Educational Researchers: For testing interventions and identifying risk factors.
    • Data Scientists and ML Practitioners: For building predictive models in education.
    • Instructors and Students: For coursework in data analysis, machine learning, and statistics.
  18. f

    Attribute information for student performance data set.

    • figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yinqiu Song; Xianqiu Meng; Jianhua Jiang (2023). Attribute information for student performance data set. [Dataset]. http://doi.org/10.1371/journal.pone.0276943.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yinqiu Song; Xianqiu Meng; Jianhua Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Attribute information for student performance data set.

  19. t

    Student Exam Performance

    • test.dbrepo.tuwien.ac.at
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azan (2025). Student Exam Performance [Dataset]. http://doi.org/10.82556/egd2-q295
    Explore at:
    Dataset updated
    Jun 15, 2025
    Authors
    Azan
    Time period covered
    2025
    Description

    This dataset is part of a student exam performance prediction use case and contains structured data such as study hours, attendance percentage, previous exam scores, and final exam scores. The data has been split into subsets (training, validation, and test) for use in a machine learning workflow. Each subset is used for a specific phase of model development: training, tuning, and evaluation. The dataset supports a regression-based model and follows FAIR data principles.

  20. College Placement Predictor Dataset

    • kaggle.com
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SameerProgrammer (2023). College Placement Predictor Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7298157
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SameerProgrammer
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    1. About the Dataset:

    Description: Dive into the world of college placements with this dataset designed to unravel the factors influencing student placement outcomes. The dataset comprises crucial parameters such as IQ scores, CGPA (Cumulative Grade Point Average), and placement status. Aspiring data scientists, researchers, and enthusiasts can leverage this dataset to uncover patterns and insights that contribute to a deeper understanding of successful college placements.

    2. Projects Ideas:

    Project Idea 1: Predictive Modeling for College Placements Utilize machine learning algorithms to build a predictive model that forecasts a student's likelihood of placement based on their IQ scores and CGPA. Evaluate and compare the effectiveness of different algorithms to enhance prediction accuracy.

    Project Idea 2: Feature Importance Analysis Conduct a feature importance analysis to identify the key factors that significantly influence placement outcomes. Gain insights into whether IQ, CGPA, or a combination of both plays a more dominant role in determining success.

    Project Idea 3: Clustering Analysis of Placement Trends Apply clustering techniques to group students based on their placement outcomes. Explore whether distinct clusters emerge, shedding light on common characteristics or trends among students who secure placements.

    Project Idea 4: Correlation Analysis with External Factors Investigate the correlation between the provided data (IQ, CGPA, placement) and external factors such as internship experience, extracurricular activities, or industry demand. Assess how these external factors may complement or influence placement success.

    Project Idea 5: Visualization of Placement Dynamics Over Time Create dynamic visualizations to illustrate how placement trends evolve over time. Analyze trends, patterns, and fluctuations in placement rates to identify potential cyclical or seasonal influences on student placements.

    3. Columns Explanation:

    • IQ:

      • Definition: Intelligence Quotient, a measure of a person's intellectual abilities.
      • Data Type: Numeric
      • Range: Typically, IQ scores range from 70 to 130, with 100 being the average.
    • CGPA:

      • Definition: Cumulative Grade Point Average, a measure of a student's overall academic performance.
      • Data Type: Numeric
      • Range: Typically, CGPA is on a scale of 0 to 4, with 4 being the highest possible score.
    • Placement:

      • Definition: Binary variable indicating whether a student secured a placement (1) or not (0).
      • Data Type: Categorical (Binary)
      • Values: 1 (Placement secured) or 0 (No placement).

    These columns collectively provide a comprehensive snapshot of a student's intellectual abilities, academic performance, and their success in securing a placement. Analyzing this dataset can offer valuable insights into the dynamics of college placements and inform strategies for optimizing student outcomes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Organization logo

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Search
Clear search
Close search
Google apps
Main menu