https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.
**********Key Objectives:*********
Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.
Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.
Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.
Dataset Details:
Analysis Highlights:
We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.
By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.
Why This Matters:
Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.
Acknowledgments:
We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.
Please Note:
This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The six data sets were created for an undergraduate course at the Babes-Bolyai University, Faculty of Mathematics and Computer Science, held for second year students in the autumn semester. The course is taught both in Romanian and English with the same content and evaluation rules in both languages. The six data sets are the following: - FirstCaseStudy_RO_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the Romanian language - FirstCaseStudy_RO_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the Romanian language - SecondCaseStudy_EN_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the English language - SecondCaseStudy_EN_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the English language - ThirdCaseStudy_Both_traditional_2019-2020.txt - the concatenation of the two data sets for the 2019-2020 academic year (so all instances from FirstCaseStudy_RO_traditional_2019-2020 and SecondCaseStudy_EN_traditional_2019-2020 together) - ThirdCaseStudy_Both_online_2020-2021.txt - the concatenation of the two data sets for the 2020-2021 academic year (so all instances from FirstCaseStudy_RO_online_2020-2021 and SecondCaseStudy_EN_online_2020-2021 together)Instances from the data sets for the 2019-2020 academic year contain 12 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - the grades received by the student for 2 practical exams. If a student did not participate in a practical exam, de grade was 0. Possible values are between 0 and 10. - the number of seminar activities that the student had. Possible values are between 0 and 7. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4Instances from the data sets for the 2020-2021 academic year contain 10 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - a seminar bonus computed based on the number of seminar activities the student had during the semester, which was added to the final grade. Possible values are between 0 and 0.5. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Student Performance Data
This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.
The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.
Highlights:
Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects
File Format:
Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record
Applications
Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization
The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a..., Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of ..., , # Data for: Integrating open education practices with data analysis of open science in an undergraduate course
Author: Marja H Bakermans Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA ORCID: https://orcid.org/0000-0002-4879-7771 Institutional IRB approval: IRB-24–0314
The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files. Below are descriptions of the name and contents of each file. NA = not applicable or no data available
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains semester-wise academic performance data of BTech students from GIET University. It includes the grades of students from their 1st to 4th semesters, along with their corresponding 5th-semester grades. The dataset is intended for use in educational data mining and machine learning applications, specifically for predicting the 5th-semester grades of students based on their past performance.The dataset consists of 379 student records, with each record containing the following attributes:
SEM 1: Grade obtained in the 1st semester.
SEM 2: Grade obtained in the 2nd semester.
SEM 3: Grade obtained in the 3rd semester.
SEM 4: Grade obtained in the 4th semester.
SEM 5: Grade obtained in the 5th semester (target variable for prediction).The grades are represented on a scale of 0 to 10, where 10 is the highest achievable grade. This dataset can be used to develop predictive models for academic performance, identify trends in student performance, and support decision-making in educational institutions.
Keywords: Grade Prediction, Student Performance, Educational Data Mining, Academic Analytics, Machine Learning, GIET University
Potential Applications:
Predicting student performance in future semesters.
Identifying at-risk students for early intervention.
Analyzing trends in academic performance over time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Student Performance Data Set’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/impapan/student-performance-data-set on 13 February 2022.
--- Dataset description provided by original source is as follows ---
This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades.
# Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:
1 school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2 sex - student's sex (binary: 'F' - female or 'M' - male)
3 age - student's age (numeric: from 15 to 22)
4 address - student's home address type (binary: 'U' - urban or 'R' - rural)
5 famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
6 Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
9 Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
10 Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
11 reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
12 guardian - student's guardian (nominal: 'mother', 'father' or 'other')
13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)
16 schoolsup - extra educational support (binary: yes or no)
17 famsup - family educational support (binary: yes or no)
18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
19 activities - extra-curricular activities (binary: yes or no)
20 nursery - attended nursery school (binary: yes or no)
21 higher - wants to take higher education (binary: yes or no)
22 internet - Internet access at home (binary: yes or no)
23 romantic - with a romantic relationship (binary: yes or no)
24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
25 freetime - free time after school (numeric: from 1 - very low to 5 - very high)
26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)
27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
29 health - current health status (numeric: from 1 - very bad to 5 - very good)
30 absences - number of school absences (numeric: from 0 to 93)
# these grades are related with the course subject, Math or Portuguese:
31 G1 - first period grade (numeric: from 0 to 20)
31 G2 - second period grade (numeric: from 0 to 20)
32 G3 - final grade (numeric: from 0 to 20, output target)
If you use this dataset in your research, please credit the authors
P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.
--- Original source retains full ownership of the source dataset ---
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.
This dataset delves into the correlation between dropout rates and student success in various educational settings. It includes comprehensive information on student demographics, academic performance, and factors contributing to dropout incidents. The dataset aims to provide valuable insights for educators, policymakers, and researchers to enhance strategies for fostering student retention and academic achievement.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17474923%2Fc00e9ef81fed562fd0f70e620fef80f7%2Fcollege-dropouts1.jpg?generation=1704037747011701&alt=media" alt="">
The dataset includes information known at the time of student enrollment – academic path, demographics, and social-economic factors.
- Marital status: Categorical variable indicating the marital status of the individual. (1 – single 2 – married 3 – widower 4 – divorced 5 – facto union 6 – legally separated)
- Application mode: Categorical variable indicating the mode of application. (1 - 1st phase - general contingent 2 - Ordinance No. 612/93 5 - 1st phase - special contingent (Azores Island) 7 - Holders of other higher courses 10 - Ordinance No. 854-B/99 15 - International student (bachelor) 16 - 1st phase - special contingent (Madeira Island) 17 - 2nd phase - general contingent 18 - 3rd phase - general contingent 26 - Ordinance No. 533-A/99, item b2) (Different Plan) 27 - Ordinance No. 533-A/99, item b3 (Other Institution) 39 - Over 23 years old 42 - Transfer 43 - Change of course 44 - Technological specialization diploma holders 51 - Change of institution/course 53 - Short cycle diploma holders 57 - Change of institution/course (International)).
- Application order: Numeric variable indicating the order of application. (between 0 - first choice; and 9 last choice).
- Course: Categorical variable indicating the chosen course. (33 - Biofuel Production Technologies 171 - Animation and Multimedia Design 8014 - Social Service (evening attendance) 9003 - Agronomy 9070 - Communication Design 9085 - Veterinary Nursing 9119 - Informatics Engineering 9130 - Equinculture 9147 - Management 9238 - Social Service 9254 - Tourism 9500 - Nursing 9556 - Oral Hygiene 9670 - Advertising and Marketing Management 9773 - Journalism and Communication 9853 - Basic Education 9991 - Management (evening attendance)).
- evening attendance: Binary variable indicating whether the individual attends classes during the daytime or evening. (1 for daytime, 0 for evening).
- Previous qualification: Numeric variable indicating the level of the previous qualification. (1 - Secondary education 2 - Higher education - bachelor's degree 3 - Higher education - degree 4 - Higher education - master's 5 - Higher education - doctorate 6 - Frequency of higher education 9 - 12th year of schooling - not completed 10 - 11th year of schooling - not completed 12 - Other - 11th year of schooling 14 - 10th year of schooling 15 - 10th year of schooling - not completed 19 - Basic education 3rd cycle (9th/10th/11th year) or equiv. 38 - Basic education 2nd cycle (6th/7th/8th year) or equiv. 39 - Technological specialization course 40 - Higher education - degree (1st cycle) 42 - Professional higher technical course 43 - Higher education - master (2nd cycle)).
- Nationality: Categorical variable indicating the nationality of the individual. (1 - Portuguese; 2 - German; 6 - Spanish; 11 - Italian; 13 - Dutch; 14 - English; 17 - Lithuanian; 21 - Angolan; 22 - Cape Verdean; 24 - Guinean; 25 - Mozambican; 26 - Santomean; 32 - Turkish; 41 - Brazilian; 62 - Romanian; 100 - Moldova (Republic of); 101 - Mexican; 103 - Ukrainian; 105 - Russian; 108 - Cuban; 109 - Colombian).
- Mother's qualification: Numeric variable indicating the level of the mother's qualification.
(1 - Secondary Education - 12th Year of Schooling or Eq. 2 - Higher Education - Bachelor's Degree 3 - Higher Education - Degree 4 - Higher Education - Master's 5 - Higher Education - Doctorate 6 - Frequency of Higher Education 9 - 12th Year of Schooling - Not Completed 10 - 11th Year of Schooling - Not Completed 11 - 7th Year (...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Student Performance Dataset is a survey of secondary school mathematics students and is a dataset containing a variety of information in a table format, including student demographics, family environment, parents' education and occupation, health, family relationships, and grades.
2) Data Utilization (1) Student Performance Dataset has characteristics that: • Each row contains a total of 33 different characteristics, including school ID, gender, age, family size, parents' educational level and occupation, family relationship, health status, and grades. • It is suitable for a variety of data analysis and prediction exercises, including regression analysis and categorical variable imbalance analysis, including the target variable Grade. (2) Student Performance Dataset can be used to: • Analyzing academic achievement prediction and influencing factors: It can be used to analyze the impact of various factors such as student's background, family environment, and parental characteristics on grades and to develop a grade prediction model. • Establishing educational policies and customized support strategies: Based on student-specific characteristics and grade data, it can be applied to establishing educational policies such as closing educational gaps, supporting vulnerable student groups, and providing customized learning guidance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Open University (OU) dataset is an open database containing student demographic and click-stream interaction with the virtual learning platform. The available data are structured in different CSV files. You can find more information about the original dataset at the following link: https://analyse.kmi.open.ac.uk/open_dataset.
We extracted a subset of the original dataset that focuses on student information. 25,819 records were collected referring to a specific student, course and semester. Each record is described by the following 20 attributes: code_module, code_presentation, gender, highest_education, imd_band, age_band, num_of_prev_attempts, studies_credits, disability, resource, homepage, forum, glossary, outcontent, subpage, url, outcollaborate, quiz, AvgScore, count.
Two target classes were considered, namely Fail and Pass, combining the original four classes (Fail and Withdrawn and Pass and Distinction, respectively). The final_result attribute contains the target values.
All features have been converted to numbers for automatic processing.
Below is the mapping used to convert categorical values to numeric:
For more detailed information, please refer to:
Casalino G., Castellano G., Vessio G. (2021) Exploiting Time in Adaptive Learning from Educational Data. In: Agrati L.S. et al. (eds) Bridges and Mediation in Higher Distance Education. HELMeTO 2020. Communications in Computer and Information Science, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
along with the corresponding answers from students and ChatGPT.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success.
Attribute | Description |
---|---|
Hours_Studied | Number of hours spent studying per week. |
Attendance | Percentage of classes attended. |
Parental_Involvement | Level of parental involvement in the student's education (Low, Medium, High). |
Access_to_Resources | Availability of educational resources (Low, Medium, High). |
Extracurricular_Activities | Participation in extracurricular activities (Yes, No). |
Sleep_Hours | Average number of hours of sleep per night. |
Previous_Scores | Scores from previous exams. |
Motivation_Level | Student's level of motivation (Low, Medium, High). |
Internet_Access | Availability of internet access (Yes, No). |
Tutoring_Sessions | Number of tutoring sessions attended per month. |
Family_Income | Family income level (Low, Medium, High). |
Teacher_Quality | Quality of the teachers (Low, Medium, High). |
School_Type | Type of school attended (Public, Private). |
Peer_Influence | Influence of peers on academic performance (Positive, Neutral, Negative). |
Physical_Activity | Average number of hours of physical activity per week. |
Learning_Disabilities | Presence of learning disabilities (Yes, No). |
Parental_Education_Level | Highest education level of parents (High School, College, Postgraduate). |
Distance_from_Home | Distance from home to school (Near, Moderate, Far). |
Gender | Gender of the student (Male, Female). |
Exam_Score | Final exam score. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students’ academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.
README
If you use this dataset, cite Herath, M., Chamindu, K., Maduwantha, H., & Ranathunga, S. (2022, June). Dataset and Baseline for Automatic Student Feedback Analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 2042-2049).
annotations_creators: []
language: - en license: - mit
This resource contains 3000 student feedback data that have been annotated for aspect terms, opinion terms… See the full description on the dataset page: https://huggingface.co/datasets/NLPC-UOM/Student_feedback_analysis_dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of machine learning models using SMOTE-balanced dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).