Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student1 Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.
This dataset was created by Rishabh Tewari
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As education increasingly relies on data-driven methodologies, accurately predicting student performance is essential for implementing timely and effective interventions. The California Student Performance Dataset offers a distinctive basis for analyzing complex elements that affect educational results, such as student demographics, academic behaviours, and emotional health. This study presents the GNN-Transformer-InceptionNet (GNN-TINet) model to overcome the constraints of prior models that fail to effectively capture intricate interactions in multi-label contexts, where students may display numerous performance categories concurrently. The GNN-TINet utilizes InceptionNet, transformer architectures, and graph neural networks (GNN) to improve precision in multi-label student performance forecasting. Advanced preprocessing approaches, such as Contextual Frequency Encoding (CFI) and Contextual Adaptive Imputation (CAI), were used on a dataset of 97,000 occurrences. The model achieved exceptional outcomes, exceeding current standards with a Predictive Consistency Score (PCS) of 0.92 and an accuracy of 98.5%. Exploratory data analysis revealed significant relationships between GPA, homework completion, and parental involvement, emphasizing the complex nature of academic achievement. The results illustrate the GNN-TINet’s potential to identify at-risk pupils, providing a robust resource for educators and policymakers to improve learning outcomes. This study enhances educational data mining by enabling focused interventions that promote educational equality, tackling significant challenges in the domain.
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Le label « Euroscol » vise à reconnaître la mobilisation des écoles et des établissements scolaires publics ou privés sous contrat s'inscrivant dans une dynamique européenne, par le portage et la participation à des projets et par la construction de parcours européens dans la perspective de la création d'un Espace européen de l'éducation. Plus d'informations sur https://eduscol.education.fr/1098/euroscol-le-label-des-ecoles-et-des-etablissements-scolaires
Unified school districts provide education to children of all school ages in their service areas. In general, where there is a unified school district, no elementary or secondary school district exists, and where there is an elementary school district the secondary school district may or may not exis
Launched in 2021 by the High Council for Artistic and Cultural Education (HCEAC), the 100% EAC label recognises a territory’s commitment to the generalisation of artistic and cultural education (EAC). Awarded for a renewable period of 5 years, this label values communities and intermunicipalities that offer artistic and cultural education to all young people in their territory, from early childhood to adulthood. Accompanied by methodological tools to develop an inventory and a strategy, it helps to strengthen the coherence of the action, federate actors, mobilize other partners, perpetuate and develop the mechanisms. The Ministers of Culture and National Education, who co-chair the HCEAC, have entrusted the prefects and rectors with the award of this label, after consulting the devolved departments of the two ministries. From the first session in 2022, 79 territories, spread across all regions, were certified 100% EAC; 78 (including two overseas) were completed in 2023. At the end of the first two calls for applications, 157 territories are certified 100% EAC. NB: - The map indicates the areas of the departments, they can include one or more communities labeled, in this case the arrows allow to view them all . - Data on partnerships and schemes are only valid for the year of labelling, as they may change over time ** For more information: https://www.culture.gouv.fr/catalogue-des-demarches-et-grants/call-a-projects-candidates/Label-100-EAC
The elementary school districts provide education to the lower grade/age levels.
The secondary school districts provide education to the upper grade/age levels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over recent decades, machine learning, an integral subfield of artificial intelligence, has revolutionized diverse sectors, enabling data-driven decisions with minimal human intervention. In particular, the field of educational assessment emerges as a promising area for machine learning applications, where students can be classified and diagnosed using their performance data. The objectives of Diagnostic Classification Models (DCMs), which provide a suite of methods for diagnosing students’ cognitive states in relation to the mastery of necessary cognitive attributes for solving problems in a test, can be effectively addressed through machine learning techniques. However, the challenge lies in the latent nature of cognitive status, which makes it difficult to obtain labels for the training dataset. Consequently, the application of machine learning methods to DCMs often assumes smaller training sets with labels derived either from theoretical considerations or human experts. In this study, the authors propose a supervised diagnostic classification model with data augmentation (SDCM-DA). This method is designed to utilize the augmented data using a data generation model constructed by leveraging the probability of correct responses for each attribute mastery pattern derived from the expert-labeled dataset. To explore the benefits of data augmentation, a simulation study is carried out, contrasting it with classification methods that rely solely on the expert-labeled dataset for training. The findings reveal that utilizing data augmentation with the estimated probabilities of correct responses substantially enhances classification accuracy. This holds true even when the augmentation originates from a small labeled sample with occasional labeling errors, and when the tests contain lower-quality items that may inaccurately measure students’ true cognitive status. Moreover, the study demonstrates that leveraging augmented data for learning can enable the successful classification of students, thereby eliminating the necessity for specifying an underlying response model.
The study was designed to determine whether a city-mandated policy requiring calorie labeling at fast food restaurants was associated with consumer awareness of labels, calories purchased, and number of fast food restaurant visits. Point-of-purchase receipts, in-person interviews, and telephone surveys via random-digit dialing were collected as a part of this study on calorie labeling in fast food restaurants. Data was collected in Philadelphia before and after calorie labeling was implemented and in Baltimore, where calorie labeling was not implemented. Baseline collected took place in December 2009 in both Baltimore and Philadelphia. Data was collected after calorie labeling took effect in Philadelphia in February 2010. Further follow-up data collection occurred in June 2010.
Researchers collected data on whether or not consumers reported seeing calorie labeling in the restaurant, whether they bought fewer or more calories as a result of the labeling, and how frequently they went to fast food restaurants. They also collected data on consumer age, gender, race, education, income, and BMI category. A total of 2,083 usable observations across both cities and data collection periods are included in the dataset.
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Le label "Génération 2024" pour les écoles et établissements scolaires vise à développer les passerelles entre le monde scolaire et le mouvement sportif pour encourager la pratique physique et sportive des jeunes. Ce jeu de données comporte les établissements dont le label a été accordé au plus tard le 28 mars 2024. Plus d'informations sur https://eduscol.education.fr/cid131907/le-label-generation-2024.html
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Between 1947 and 1956 a study lead by Mick Olsen resulted in 6502 school and 587 gummy sharks being tagged in south-east Australia. Most of the school shark were tagged in inshore bays and estuaries, notably Port Phillip Bay, Port Sorell, Georges Bay and Pittwater. Most of the gummy shark were tagged in inshore areas around Flinders Island and the north coast of Tasmania. A total of 594 school shark and 60 gummy shark were recaptured. This data set includes field sheets and the tags returned to CSIRO.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of files
Ontology
For more details, please refer to http://w3id.org/ckgg
The pupil to teacher ratio data includes figures for both elementary and high schools in Champaign County. This indicator includes the following school districts: Champaign Community Unit School District #4, Fisher Community Unit School District #1, Gifford Community Consolidated Grade School District #188, Ludlow Community Consolidated School District #142, Mahomet-Seymour Community Unit School District #3, Rantoul City School District #137, Rantoul Township High School District #193, St. Joseph Community Consolidated School District #169, St. Joseph-Ogden Community High School District #305, Tolono Community Unit School District #7, and Urbana School District #116. How many pupils per teacher there are in a district can reflect a number of other conditions. We included this indicator to provide some information on classroom size and instruction.
The pupil to teacher ratio shifts slightly from year to year in most districts, but the changes are often relatively small. Most districts’ ratios hover between 15:1 and 25:1 for most or all of the measured time period, with a few districts consistently below 15:1. The average ratio for all Champaign County schools was 16:1 every year from 2008 through 2020, reaching a new low of 15:1 for three of the four years between 2021 and 2024. There is no county-wide unifying trend.
This data, along with a variety of other school district data, is available on the Illinois Report Card, an Illinois State Board of Education and Northern Illinois University website.
Sources: Illinois Report Card. (2023-2024). Champaign CUSD 4. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Fisher CUSD 1. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Gifford CCSD 188. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Ludlow CCSD 142. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Mahomet-Seymour CUSD 3. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Prairieview-Ogden CCSD 197. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Rantoul City SD 137. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Rantoul Township HSD 193. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). St. Joseph CCSD 169. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). St. Joseph Ogden CHSD 305. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Thomasboro CCSD 130. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Tolono CUSD 7. Illinois State Board of Education. (Accessed 6 December 2024). Illinois Report Card. (2023-2024). Urbana SD 116. Illinois State Board of Education. (Accessed 6 December 2024).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
We collected EEG signal data from 10 college students while they watched MOOC video clips. We extracted online education videos that are assumed not to be confusing for college students, such as videos of the introduction of basic algebra or geometry. We also prepare videos that are expected to confuse a typical college student if a student is not familiar with the video topics like Quantum Mechanics, and Stem Cell Research. We prepared 20 videos, 10 in each category. Each video was about 2 minutes long. We chopped the two-minute clip in the middle of a topic to make the videos more confusing. The students wore a single-channel wireless MindSet that measured activity over the frontal lobe. The MindSet measures the voltage between an electrode resting on the forehead and two electrodes (one ground and one reference) each in contact with an ear. After each session, the student rated his/her confusion level on a scale of 1-7, where one corresponded to the least confusing and seven corresponded to the most confusing. These labels if further normalized into labels of whether the students are confused or not. This label is offered as self-labelled confusion in addition to our predefined label of confusion.
These data are collected from ten students, each watching ten videos. Therefore, it can be seen as only 100 data points for these 12000+ rows. If you look at this way, then each data point consists of 120+ rows, which is sampled every 0.5 seconds (so each data point is a one minute video). Signals with higher frequency are reported as the mean value during each 0.5 second.
EEG_data.csv: Contains the EEG data recorded from 10 students
demographic.csv: Contains demographic information for each student
video data : Each video lasts roughly two-minute long, we remove the first 30 seconds and last 30 seconds, only collect the EEG data during the middle 1 minute.
The data is collected from a software that we implemented ourselves. Check HaohanWang/Bioimaging for the source code.
This dataset is an extremely challenging data set to perform binary classification. Here are some recent classification results for reference:
It is an interesting data set to carry out the variable selection (causal inference) task that may help further research. Past research has indicated that Theta signal is correlated with confusion level.
It is also an interesting data set for confounding factors correction model because we offer two labels (subject id and video id) that could profoundly confound the results.
Other Resources
Source Code of Data Collection Software
Contact
Ce jeu de données fournit la liste des écoles de l'académie de Montpellier détentrices d’un label numérique. Le label numérique école est depuis 2018 une démarche pour structurer le dialogue entre les écoles de l’académie de Montpellier et les collectivités.
Le label numérique école est attribué aux écoles élémentaires et primaires pour 2 années scolaires, il permet :
d'attester d’usages du numérique et de niveaux d’équipement permettant ces usages ;de structurer et conforter le dialogue entre l’école et la commune ;de partager des référentiels communs pour le développement des usages pédagogiques pertinents.
Il se décline sur 3 niveaux et permet un positionnement de l’école sur 5 critères :
Identification de référents au niveau de l’école et de la commune pour le numériqueNiveaux d’usages de l’ENT académique 1er degré ENT-écolePrésence de projets numériques dans les classesAdhésion de l’école à l’accompagnement proposé par la circonscriptionÉquipements mis en place par la collectivité
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Lancé en 2021 par le Haut Conseil de l’éducation artistique et culturelle (HCEAC), le label 100% EAC reconnaît l’engagement d’un territoire en faveur de la généralisation de l’éducation artistique et culturelle (EAC). Décerné pour une durée de 5 ans renouvelables, ce label valorise les collectivités et les intercommunalités qui proposent une éducation artistique et culturelle à l’ensemble des jeunes de leur territoire, de la petite enfance à l’âge adulte. Accompagné d’outils méthodologiques permettant d’élaborer un état des lieux et une stratégie, il aide à renforcer la cohérence de l‘action, fédérer les acteurs, mobiliser d’autres partenaires, pérenniser et développer les dispositifs. Les ministres de la culture et de l’éducation nationale, qui co-président le HCEAC, ont confié aux préfets et aux recteurs l’attribution de ce label, après avis des services déconcentrés des deux ministères. Dès la première session en 2022, 79 territoires, répartis dans toutes les régions, ont été labellisés 100% EAC ; 78 (dont deux d’outre-mer) l’ont été en 2023. A l'issue des deux premiers appels à candidature, 157 territoires sont labellisés 100% EAC. NB : - La carte indique les surfaces des départements, ceux-ci peuvent inclure une ou plusieurs collectivités labellisées, dans ce cas les flèches permettent de toutes les visualiser . - Les données concernant les partenariats et les dispositifs ne sont valables que l’année de labellisation, puisqu’ils peuvent évoluer dans le temps .** Pour en savoir plus : https://www.culture.gouv.fr/catalogue-des-demarches-et-subventions/appels-a-projets-candidatures/Label-100-EAC
Between 1947 and 1956 a study lead by Mick Olsen resulted in 6502 school and 587 gummy sharks being tagged in south-east Australia. Most of the school shark were tagged in inshore bays and estuaries, notably Port Phillip Bay, Port Sorell, Georges Bay and Pittwater. Most of the gummy shark were tagged in inshore areas around Flinders Island and the north coast of Tasmania. A total of 594 school shark and 60 gummy shark were recaptured. This data set includes field sheets and the tags returned to CSIRO. These records are cataloged in the TRIM Records database, as follows: AB2008/1038: CMAR - School and Gummy Shark Tagging by CSIRO in Southern Australia 1947-1956 - Mick Olsen and Grant West - MarLIN record 8218 This Archive Box number incorporates 2 containers: "C2008/6921-01: CMAR - School and Gummy Shark Tagging by CSIRO in Southern Australia 1947-1956 - Mick Olsen and Grant West - MarLIN record 8218 - Part 1 - Tag Data Field Sheets" [associated files lodged within as separate objects]; and "C2008/6921-02: CMAR - School and Gummy Shark Tagging by CSIRO in Southern Australia 1947-1956 - Mick Olsen and Grant West - MarLIN record 8218 - Part 2 - Tags and Olsen Card Index [in metal filing cabinet]"
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
. Fields of this game: — UAI code (registered administrative unit), — no SIRET, — type of establishment, — name, — abbreviated, — statute (public, private...), — guardianship (ministries, consular chambers...), — University of attachment (labelled and direct access link to the onisep.fr file with unique identifier Onisep) — related establishments (labelled and direct access links to the onisep.fr sheet with unique Onisep identifiers) — geographical coordinates (postal box, postal address, postal code, municipality, common identifier, cedex mention, telephone number, district, department, academy, region, region, longitude X, latitude Y), — open days, — labeling generation 2024 (unexpired labels of the game ‘https://data.education.gouv.fr/explore/dataset/fr-en-etablissements-labellises-generation-2024’ on 25/04/2022). See also: HTTPS://GENERATION.PARIS2024.ORG/LABEL-GENERATION-2024 — direct access link to the page onisep.fr and unique identifier Onisep . Updates: This game will be updated approximately ten times a year, at the pace of the orientation calendar and updates to the Onisep.fr site.
Ce jeu de données fournit la liste des collèges de la région Occitanie ayant obtenu la labellisation numérique depuis 2017.De 2017 à 2023, seuls les collèges de l'académie de Montpellier étaient concernés. En 2024, 4 départements de l'académie de Toulouse intègrent le dispositif de labellisation numérique : Ariège, Haute-Garonne, Gers et Hautes-Pyrénées.Les collèges labellisés sont classés selon 3 niveaux qui jaugent leur implication et qui sont aussi la base de sélection pour le déploiement de moyens différenciés.
La commission d'attribution du label numérique se prononce
à partir de la notation d’un ensemble de critères regroupés sous
4 domaines :
- Pilotage numérique dans l'établissement
- Infrastructure et équipement numériques de
l’établissement
- Services et usages pédagogiques numériques de
l’établissement
- Accompagnement et formation des équipes pédagogiques au
numérique
Les départements et les autorités académiques s’engagent
auprès des lauréats à apporter des moyens supplémentaires pour
faciliter l’utilisation du numérique à des fins pédagogiques.
Les engagements des Départements vis-à-vis des lauréats :
- Le développement des infrastructures = réseau, wifi,
serveurs,
- Le renforcement de la dotation en matériels : vidéo
projecteurs, postes fixes, mobiliers particuliers…,
- Le soutien aux usages de l’ENT,
- Le soutien aux projets numériques : actions éducatives et
projets spécifiques.
Les engagements des académies vis-à-vis des lauréats :
- La formation des personnels,
- La mise à disposition de ressources pédagogiques
adaptées,
- L’accompagnement des usages par les corps d’inspection,
- Le soutien à l’innovation pédagogique.
Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student1 Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.