100+ datasets found
  1. Student Performance Data Set

    • kaggle.com
    Updated Mar 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  2. student data analysis

    • kaggle.com
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    maira javeed (2023). student data analysis [Dataset]. https://www.kaggle.com/datasets/mairajaveed/student-data-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    maira javeed
    Description

    In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.

    **********Key Objectives:*********

    1. Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.

    2. Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.

    3. Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.

    Dataset Details:

    • The dataset used in this analysis contains information about students, including their age, gender, parental education, lunch type, and test scores in subjects like mathematics, reading, and writing.

    Analysis Highlights:

    • We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.

    • By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.

    Why This Matters:

    Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.

    Acknowledgments:

    We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.

    Please Note:

    This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.

  3. Students performance prediction data set - traditional vs. online learning

    • figshare.com
    txt
    Updated Mar 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Czibula; Maier Mariana; Zsuzsanna Onet-Marian (2021). Students performance prediction data set - traditional vs. online learning [Dataset]. http://doi.org/10.6084/m9.figshare.14330447.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 28, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Gabriela Czibula; Maier Mariana; Zsuzsanna Onet-Marian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The six data sets were created for an undergraduate course at the Babes-Bolyai University, Faculty of Mathematics and Computer Science, held for second year students in the autumn semester. The course is taught both in Romanian and English with the same content and evaluation rules in both languages. The six data sets are the following: - FirstCaseStudy_RO_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the Romanian language - FirstCaseStudy_RO_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the Romanian language - SecondCaseStudy_EN_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the English language - SecondCaseStudy_EN_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the English language - ThirdCaseStudy_Both_traditional_2019-2020.txt - the concatenation of the two data sets for the 2019-2020 academic year (so all instances from FirstCaseStudy_RO_traditional_2019-2020 and SecondCaseStudy_EN_traditional_2019-2020 together) - ThirdCaseStudy_Both_online_2020-2021.txt - the concatenation of the two data sets for the 2020-2021 academic year (so all instances from FirstCaseStudy_RO_online_2020-2021 and SecondCaseStudy_EN_online_2020-2021 together)Instances from the data sets for the 2019-2020 academic year contain 12 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - the grades received by the student for 2 practical exams. If a student did not participate in a practical exam, de grade was 0. Possible values are between 0 and 10. - the number of seminar activities that the student had. Possible values are between 0 and 7. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4Instances from the data sets for the 2020-2021 academic year contain 10 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - a seminar bonus computed based on the number of seminar activities the student had during the semester, which was added to the final grade. Possible values are between 0 and 0.5. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4

  4. student-performance-data

    • kaggle.com
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Azam (2025). student-performance-data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12160820
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Azam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Student Performance Data

    This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.

    The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.

    Highlights:

    Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects

    File Format:

    Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record

    Applications

    Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization

  5. d

    Data for: Integrating open education practices with data analysis of open...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marja Bakermans (2024). Data for: Integrating open education practices with data analysis of open science in an undergraduate course [Dataset]. http://doi.org/10.5061/dryad.37pvmcvst
    Explore at:
    Dataset updated
    Jul 27, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Marja Bakermans
    Description

    The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a..., Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of ..., , # Data for: Integrating open education practices with data analysis of open science in an undergraduate course

    Author: Marja H Bakermans Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA ORCID: https://orcid.org/0000-0002-4879-7771 Institutional IRB approval: IRB-24–0314

    Data and file overview

    The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files. Below are descriptions of the name and contents of each file. NA = not applicable or no data available

    1. BestPracticesData.csv
      • Description: Data to assess the adherence of articles and datasets to open science best practices.
      • Column headers and descriptions:
        • Article: articles used in the study, numbered randomly
        • F1: Findable, Data are assigned a unique and persistent doi
        • F2: Findable, Metadata includes an identifier of data
        • F3: Findable, Data are registered in a searchable database
        • A1: ...
  6. m

    Data from: Student grade Prediction

    • data.mendeley.com
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neelamcadhab Padhy (2025). Student grade Prediction [Dataset]. http://doi.org/10.17632/6dgkv6kpr2.1
    Explore at:
    Dataset updated
    Mar 24, 2025
    Authors
    Neelamcadhab Padhy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains semester-wise academic performance data of BTech students from GIET University. It includes the grades of students from their 1st to 4th semesters, along with their corresponding 5th-semester grades. The dataset is intended for use in educational data mining and machine learning applications, specifically for predicting the 5th-semester grades of students based on their past performance.The dataset consists of 379 student records, with each record containing the following attributes:

    SEM 1: Grade obtained in the 1st semester.

    SEM 2: Grade obtained in the 2nd semester.

    SEM 3: Grade obtained in the 3rd semester.

    SEM 4: Grade obtained in the 4th semester.

    SEM 5: Grade obtained in the 5th semester (target variable for prediction).The grades are represented on a scale of 0 to 10, where 10 is the highest achievable grade. This dataset can be used to develop predictive models for academic performance, identify trends in student performance, and support decision-making in educational institutions.

    Keywords: Grade Prediction, Student Performance, Educational Data Mining, Academic Analytics, Machine Learning, GIET University

    Potential Applications:

    Predicting student performance in future semesters.

    Identifying at-risk students for early intervention.

    Analyzing trends in academic performance over time.

  7. f

    Performance of models using CNN features.

    • plos.figshare.com
    xls
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf (2023). Performance of models using CNN features. [Dataset]. http://doi.org/10.1371/journal.pone.0293061.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.

  8. a

    Educational Process Mining (EPM): A Learning Analytics Data Set Data Set

    • academictorrents.com
    bittorrent
    Updated Feb 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehrnoosh Vahdatand Luca Oneto and Davide Anguita and Mathias Funk and Matthias Rauterberg (2016). Educational Process Mining (EPM): A Learning Analytics Data Set Data Set [Dataset]. https://academictorrents.com/details/e24e083cc337695bb84a2b68707695579c0ab4d8
    Explore at:
    bittorrent(4934446)Available download formats
    Dataset updated
    Feb 11, 2016
    Dataset authored and provided by
    Mehrnoosh Vahdatand Luca Oneto and Davide Anguita and Mathias Funk and Matthias Rauterberg
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Data Set Information: The experiments have been carried out with a group of 115 students of first-year, undergraduate Engineering major of the University of Genoa. We carried out this study over a simulation environment named Deeds (Digital Electronics Education and Design Suite) which is used for e-learning in digital electronics. The environment provides learning materials through specialized browsers for the students, and asks them to solve various problems with different levels of difficulty. For more information about the Deeds simulator used for this course look at: [Web Link] and to know more about the exercises contents of each session see exercises_info.txt . Our data set contains the students time series of activities during six sessions of laboratory sessions of the course of digital electronics. There are 6 folders containing the students’ data per session. Each Session folder contains up to 99 CSV files each dedicated to a specific student log during that ses

  9. m

    Data from: Dataset of Student Level Prediction in UAE

    • data.mendeley.com
    Updated Dec 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shatha Ghareeb (2020). Dataset of Student Level Prediction in UAE [Dataset]. http://doi.org/10.17632/3g8dtwbjjy.1
    Explore at:
    Dataset updated
    Dec 18, 2020
    Authors
    shatha Ghareeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Arab Emirates
    Description

    The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.

  10. A

    ‘Student Performance Data Set’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 2, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Student Performance Data Set’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-student-performance-data-set-f14e/0580d4d6/?iid=079-283&v=presentation
    Explore at:
    Dataset updated
    Mar 2, 2015
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Student Performance Data Set’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/impapan/student-performance-data-set on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades.
    

    Attribute Information:

    # Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:
    1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
    2 sex - student's sex (binary: 'F' - female or 'M' - male)
    3 age - student's age (numeric: from 15 to 22)
    4 address - student's home address type (binary: 'U' - urban or 'R' - rural)
    5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
    6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
    7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
    12 guardian - student's guardian (nominal: 'mother', 'father' or 'other')
    13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
    14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
    15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)
    16 schoolsup - extra educational support (binary: yes or no)
    17 famsup - family educational support (binary: yes or no)
    18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
    19 activities - extra-curricular activities (binary: yes or no)
    20 nursery - attended nursery school (binary: yes or no)
    21 higher - wants to take higher education (binary: yes or no)
    22 internet - Internet access at home (binary: yes or no)
    23 romantic - with a romantic relationship (binary: yes or no)
    24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
    25 freetime - free time after school (numeric: from 1 - very low to 5 - very high)
    26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)
    27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
    28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
    29 health - current health status (numeric: from 1 - very bad to 5 - very good)
    30 absences - number of school absences (numeric: from 0 to 93)
    
    # these grades are related with the course subject, Math or Portuguese:
    31 G1 - first period grade (numeric: from 0 to 20)
    31 G2 - second period grade (numeric: from 0 to 20)
    32 G3 - final grade (numeric: from 0 to 20, output target)
    

    Acknowledgements

    If you use this dataset in your research, please credit the authors

    Citations

    P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.
    

    --- Original source retains full ownership of the source dataset ---

  11. Dropout and Success: Student Data Analysis

    • kaggle.com
    Updated Dec 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marouan daghmoumi (2023). Dropout and Success: Student Data Analysis [Dataset]. https://www.kaggle.com/datasets/marouandaghmoumi/dropout-and-success-student-data-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marouan daghmoumi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Summary

    dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.

    Introduction

    This dataset delves into the correlation between dropout rates and student success in various educational settings. It includes comprehensive information on student demographics, academic performance, and factors contributing to dropout incidents. The dataset aims to provide valuable insights for educators, policymakers, and researchers to enhance strategies for fostering student retention and academic achievement.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17474923%2Fc00e9ef81fed562fd0f70e620fef80f7%2Fcollege-dropouts1.jpg?generation=1704037747011701&alt=media" alt="">

    Dataset

    The dataset includes information known at the time of student enrollment – academic path, demographics, and social-economic factors.

    - Marital status: Categorical variable indicating the marital status of the individual. (1 – single 2 – married 3 – widower 4 – divorced 5 – facto union 6 – legally separated)

    - Application mode: Categorical variable indicating the mode of application. (1 - 1st phase - general contingent 2 - Ordinance No. 612/93 5 - 1st phase - special contingent (Azores Island) 7 - Holders of other higher courses 10 - Ordinance No. 854-B/99 15 - International student (bachelor) 16 - 1st phase - special contingent (Madeira Island) 17 - 2nd phase - general contingent 18 - 3rd phase - general contingent 26 - Ordinance No. 533-A/99, item b2) (Different Plan) 27 - Ordinance No. 533-A/99, item b3 (Other Institution) 39 - Over 23 years old 42 - Transfer 43 - Change of course 44 - Technological specialization diploma holders 51 - Change of institution/course 53 - Short cycle diploma holders 57 - Change of institution/course (International)).

    - Application order: Numeric variable indicating the order of application. (between 0 - first choice; and 9 last choice).

    - Course: Categorical variable indicating the chosen course. (33 - Biofuel Production Technologies 171 - Animation and Multimedia Design 8014 - Social Service (evening attendance) 9003 - Agronomy 9070 - Communication Design 9085 - Veterinary Nursing 9119 - Informatics Engineering 9130 - Equinculture 9147 - Management 9238 - Social Service 9254 - Tourism 9500 - Nursing 9556 - Oral Hygiene 9670 - Advertising and Marketing Management 9773 - Journalism and Communication 9853 - Basic Education 9991 - Management (evening attendance)).

    - evening attendance: Binary variable indicating whether the individual attends classes during the daytime or evening. (1 for daytime, 0 for evening).

    - Previous qualification: Numeric variable indicating the level of the previous qualification. (1 - Secondary education 2 - Higher education - bachelor's degree 3 - Higher education - degree 4 - Higher education - master's 5 - Higher education - doctorate 6 - Frequency of higher education 9 - 12th year of schooling - not completed 10 - 11th year of schooling - not completed 12 - Other - 11th year of schooling 14 - 10th year of schooling 15 - 10th year of schooling - not completed 19 - Basic education 3rd cycle (9th/10th/11th year) or equiv. 38 - Basic education 2nd cycle (6th/7th/8th year) or equiv. 39 - Technological specialization course 40 - Higher education - degree (1st cycle) 42 - Professional higher technical course 43 - Higher education - master (2nd cycle)).

    - Nationality: Categorical variable indicating the nationality of the individual. (1 - Portuguese; 2 - German; 6 - Spanish; 11 - Italian; 13 - Dutch; 14 - English; 17 - Lithuanian; 21 - Angolan; 22 - Cape Verdean; 24 - Guinean; 25 - Mozambican; 26 - Santomean; 32 - Turkish; 41 - Brazilian; 62 - Romanian; 100 - Moldova (Republic of); 101 - Mexican; 103 - Ukrainian; 105 - Russian; 108 - Cuban; 109 - Colombian).

    - Mother's qualification: Numeric variable indicating the level of the mother's qualification.
    (1 - Secondary Education - 12th Year of Schooling or Eq. 2 - Higher Education - Bachelor's Degree 3 - Higher Education - Degree 4 - Higher Education - Master's 5 - Higher Education - Doctorate 6 - Frequency of Higher Education 9 - 12th Year of Schooling - Not Completed 10 - 11th Year of Schooling - Not Completed 11 - 7th Year (...

  12. m

    SPHERE: Students' performance dataset of conceptual understanding,...

    • data.mendeley.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Purwoko Haryadi Santoso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.

  13. c

    Student Performance Dataset

    • cubig.ai
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Student Performance Dataset [Dataset]. https://cubig.ai/store/products/358/student-performance-dataset
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.

    2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.

  14. Student oriented subset of the Open University Learning Analytics dataset

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gennaro Vessio; Gennaro Vessio (2021). Student oriented subset of the Open University Learning Analytics dataset [Dataset]. http://doi.org/10.5281/zenodo.4264397
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 30, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gennaro Vessio; Gennaro Vessio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Open University (OU) dataset is an open database containing student demographic and click-stream interaction with the virtual learning platform. The available data are structured in different CSV files. You can find more information about the original dataset at the following link: https://analyse.kmi.open.ac.uk/open_dataset.

    We extracted a subset of the original dataset that focuses on student information. 25,819 records were collected referring to a specific student, course and semester. Each record is described by the following 20 attributes: code_module, code_presentation, gender, highest_education, imd_band, age_band, num_of_prev_attempts, studies_credits, disability, resource, homepage, forum, glossary, outcontent, subpage, url, outcollaborate, quiz, AvgScore, count.

    Two target classes were considered, namely Fail and Pass, combining the original four classes (Fail and Withdrawn and Pass and Distinction, respectively). The final_result attribute contains the target values.

    All features have been converted to numbers for automatic processing.

    Below is the mapping used to convert categorical values to numeric:

    • code_module: 'AAA'=0, 'BBB'=1, 'CCC'=2, 'DDD'=3, 'EEE'=4, 'FFF'=5, 'GGG'=6
    • code_presentation: '2013B'=0, '2013J'=1, '2014B'=2, '2014J'=3
    • gender: 'F'=0, 'M'=1
    • highest_education: 'No_Formal_quals'=0, 'Post_Graduate_Qualification'=1, 'HE_Qualification'=2, 'Lower_Than_A_Level'=3, 'A_level_or_Equivalent'=4
    • IMBD_band: 'unknown'=0, 'between_0_and_10_percent'=1, 'between_10_and_20_percent'=2, 'between_20_and_30_percent'=3, 'between_30_and_40_percent'=4, 'between_40_and_50_percent'=5, 'between_50_and_60_percent'=6, 'between_60_and_70_percent'=7, 'between_70_and_80_percent'=8, 'between_80_and_90_percent'=9, 'between_90_and_100_percent'=10
    • age_band: 'between_0_and_35'=0, 'between_35_and_55'=1, 'higher_than_55'=2
    • disability: 'N'=0, 'Y'=1
    • student's outcome: 'Fail'=0, 'Pass'=1

    For more detailed information, please refer to:


    Casalino G., Castellano G., Vessio G. (2021) Exploiting Time in Adaptive Learning from Educational Data. In: Agrati L.S. et al. (eds) Bridges and Mediation in Higher Distance Education. HELMeTO 2020. Communications in Computer and Information Science, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_1

  15. i

    "ChatGPT vs. Student: A Dataset for Source Classification of Computer...

    • ieee-dataport.org
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ALI ABDULLAH S ALQAHTANI (2023). "ChatGPT vs. Student: A Dataset for Source Classification of Computer Science Answers [Dataset]. https://ieee-dataport.org/documents/chatgpt-vs-student-dataset-source-classification-computer-science-answers
    Explore at:
    Dataset updated
    Jul 19, 2023
    Authors
    ALI ABDULLAH S ALQAHTANI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    along with the corresponding answers from students and ChatGPT.

  16. S

    Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...

    • scidb.cn
    • observatorio-cientifico.ua.es
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Márquez-Carpintero; Sergio Suescun-Ferrandiz; Carolina Lorenzo Álvarez; Jorge Fernandez-Herrero; Diego Viejo; Rosabel Roig-Vila; Miguel Cazorla (2024). DIPSEER: A Dataset for In-Person Student Emotion and Engagement Recognition in the Wild [Dataset]. http://doi.org/10.57760/sciencedb.11541
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Luis Márquez-Carpintero; Sergio Suescun-Ferrandiz; Carolina Lorenzo Álvarez; Jorge Fernandez-Herrero; Diego Viejo; Rosabel Roig-Vila; Miguel Cazorla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.

  17. Student Performance Factors

    • kaggle.com
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Practice Data Analysis With Me (2024). Student Performance Factors [Dataset]. https://www.kaggle.com/datasets/lainguyn123/student-performance-factors
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Practice Data Analysis With Me
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    This dataset provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success.

    Column Descriptions

    AttributeDescription
    Hours_StudiedNumber of hours spent studying per week.
    AttendancePercentage of classes attended.
    Parental_InvolvementLevel of parental involvement in the student's education (Low, Medium, High).
    Access_to_ResourcesAvailability of educational resources (Low, Medium, High).
    Extracurricular_ActivitiesParticipation in extracurricular activities (Yes, No).
    Sleep_HoursAverage number of hours of sleep per night.
    Previous_ScoresScores from previous exams.
    Motivation_LevelStudent's level of motivation (Low, Medium, High).
    Internet_AccessAvailability of internet access (Yes, No).
    Tutoring_SessionsNumber of tutoring sessions attended per month.
    Family_IncomeFamily income level (Low, Medium, High).
    Teacher_QualityQuality of the teachers (Low, Medium, High).
    School_TypeType of school attended (Public, Private).
    Peer_InfluenceInfluence of peers on academic performance (Positive, Neutral, Negative).
    Physical_ActivityAverage number of hours of physical activity per week.
    Learning_DisabilitiesPresence of learning disabilities (Yes, No).
    Parental_Education_LevelHighest education level of parents (High School, College, Postgraduate).
    Distance_from_HomeDistance from home to school (Near, Moderate, Far).
    GenderGender of the student (Male, Female).
    Exam_ScoreFinal exam score.
  18. f

    Detailed description of the dataset.

    • plos.figshare.com
    xls
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf (2023). Detailed description of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0293061.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.

  19. h

    Student_feedback_analysis_dataset

    • huggingface.co
    Updated Aug 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The National Languages Processing Centre (2022). Student_feedback_analysis_dataset [Dataset]. https://huggingface.co/datasets/NLPC-UOM/Student_feedback_analysis_dataset
    Explore at:
    Dataset updated
    Aug 26, 2022
    Dataset authored and provided by
    The National Languages Processing Centre
    Description

    README

      If you use this dataset, cite Herath, M., Chamindu, K., Maduwantha, H., & Ranathunga, S. (2022, June). Dataset and Baseline for Automatic Student Feedback Analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 2042-2049).
    

    Annotated Student Feedback

      annotations_creators: []
    

    language: - en license: - mit

    This resource contains 3000 student feedback data that have been annotated for aspect terms, opinion terms… See the full description on the dataset page: https://huggingface.co/datasets/NLPC-UOM/Student_feedback_analysis_dataset.

  20. f

    Performance of machine learning models using SMOTE-balanced dataset.

    • plos.figshare.com
    xls
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf (2023). Performance of machine learning models using SMOTE-balanced dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0293061.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nihal Abuzinadah; Muhammad Umer; Abid Ishaq; Abdullah Al Hejaili; Shtwai Alsubai; Ala’ Abdulmajid Eshmawi; Abdullah Mohamed; Imran Ashraf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of machine learning models using SMOTE-balanced dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Organization logo

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Search
Clear search
Close search
Google apps
Main menu