75 datasets found
  1. Student Performance Data Set

    • kaggle.com
    Updated Mar 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  2. student-performance-data

    • kaggle.com
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Azam (2025). student-performance-data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12160820
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Azam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Student Performance Data

    This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.

    The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.

    Highlights:

    Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects

    File Format:

    Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record

    Applications

    Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization

  3. data: Make a Guess - An experiment on Academic Performance

    • figshare.com
    txt
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Silva (2025). data: Make a Guess - An experiment on Academic Performance [Dataset]. http://doi.org/10.6084/m9.figshare.29336903.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rui Silva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and do file

  4. f

    Data_Sheet_1_Advanced large language models and visualization tools for data...

    • frontiersin.figshare.com
    txt
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

  5. f

    Data from: The Impact of Homework Deadline Times on College Student...

    • tandf.figshare.com
    csv
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlie Smith (2025). The Impact of Homework Deadline Times on College Student Performance and Stress: A Quasi-Experiment in Business Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.28027731.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Charlie Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Impact of Homework Deadline Times on College Student Performance and Stress: A Quasi-Experiment in Business Statistics

  6. Randomized Experiment of Playworks Analytic Files for 2010-2011 and...

    • icpsr.umich.edu
    ascii, sas, spss
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James-Burdurmy, Susanne (2024). Randomized Experiment of Playworks Analytic Files for 2010-2011 and 2011-2012 Cohorts in Six United States Cities [Dataset]. http://doi.org/10.3886/ICPSR35638.v2
    Explore at:
    spss, sas, asciiAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    James-Burdurmy, Susanne
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/35638/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/35638/terms

    Area covered
    United States
    Description

    The Robert Wood Johnson Foundation (RWJF) contracted with Mathematica Policy Research and its subcontractor, the John W. Gardner Center for Youth and Their Communities (JGC) at Stanford University, to conduct a rigorous evaluation of Playworks, a program for structured play during recess, class time and after school in low-income school districts. These data were collected as part of the evaluation. Twenty-nine urban schools interested in implementing Playworks were randomly assigned to treatment and control groups during the 2010-2011 (cohort 1) or 2011-2012 (cohort 2) school years. During the one-year study period for each cohort, treatment schools received Playworks and control schools were not eligible to implement Playworks. Mathematica and JGC collected data from students, teachers and school staff at 25 cohort 1 schools in spring 2011 and an additional four cohort 2 schools in spring 2012 to document the implementation of Playworks and assess the program's impact on key outcomes related to school climate; conflict resolution and aggression; learning and academic performance; youth development; student behavior; and play, physical activity and recess. Data collection activities included administration of student and teacher surveys, collection of physical activity data via accelerometers, structured observations of recess periods and collection of administrative records.

  7. f

    Description of the data.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laia Subirats; Aina Palacios Corral; Sof´ıa Pérez-Ruiz; Santi Fort; Go´mez-Mon˜ivas Sacha (2023). Description of the data. [Dataset]. http://doi.org/10.1371/journal.pone.0282306.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Laia Subirats; Aina Palacios Corral; Sof´ıa Pérez-Ruiz; Santi Fort; Go´mez-Mon˜ivas Sacha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study provides the profiles and success predictions of students considering data before, during, and after the COVID-19 pandemic. Using a field experiment of 396 students and more than 7400 instances, we have analyzed students’ performance considering the temporal distribution of autonomous learning during courses from 2016/2017 to 2020/2021. After applying unsupervised learning, results show 3 main profiles from the clusters obtained in the simulations: students who work continuously, those who do it in the last-minute, and those with a low performance in the whole autonomous learning. We have found that the highest success ratio is related to students that work in a continuous basis. However, last-minute working is not necessarily linked to failure. We have also found that students’ marks can be predicted successfully taking into account the whole data sets. However, predictions are worse when removing data from the month before the final exam. These predictions are useful to prevent students’ wrong learning strategies, and to detect malpractices such as copying. We have done all these analyses taking into account the effect of the COVID-19 pandemic, founding that students worked in a more continuous basis in the confinement. This effect was still present one year after. Finally, We have also included an analysis of the techniques that could be more effective to keep in a future non-pandemic scenario the good habits that were detected in the confinement.

  8. College Placement Predictor Dataset

    • kaggle.com
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SameerProgrammer (2023). College Placement Predictor Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/7298157
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SameerProgrammer
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    1. About the Dataset:

    Description: Dive into the world of college placements with this dataset designed to unravel the factors influencing student placement outcomes. The dataset comprises crucial parameters such as IQ scores, CGPA (Cumulative Grade Point Average), and placement status. Aspiring data scientists, researchers, and enthusiasts can leverage this dataset to uncover patterns and insights that contribute to a deeper understanding of successful college placements.

    2. Projects Ideas:

    Project Idea 1: Predictive Modeling for College Placements Utilize machine learning algorithms to build a predictive model that forecasts a student's likelihood of placement based on their IQ scores and CGPA. Evaluate and compare the effectiveness of different algorithms to enhance prediction accuracy.

    Project Idea 2: Feature Importance Analysis Conduct a feature importance analysis to identify the key factors that significantly influence placement outcomes. Gain insights into whether IQ, CGPA, or a combination of both plays a more dominant role in determining success.

    Project Idea 3: Clustering Analysis of Placement Trends Apply clustering techniques to group students based on their placement outcomes. Explore whether distinct clusters emerge, shedding light on common characteristics or trends among students who secure placements.

    Project Idea 4: Correlation Analysis with External Factors Investigate the correlation between the provided data (IQ, CGPA, placement) and external factors such as internship experience, extracurricular activities, or industry demand. Assess how these external factors may complement or influence placement success.

    Project Idea 5: Visualization of Placement Dynamics Over Time Create dynamic visualizations to illustrate how placement trends evolve over time. Analyze trends, patterns, and fluctuations in placement rates to identify potential cyclical or seasonal influences on student placements.

    3. Columns Explanation:

    • IQ:

      • Definition: Intelligence Quotient, a measure of a person's intellectual abilities.
      • Data Type: Numeric
      • Range: Typically, IQ scores range from 70 to 130, with 100 being the average.
    • CGPA:

      • Definition: Cumulative Grade Point Average, a measure of a student's overall academic performance.
      • Data Type: Numeric
      • Range: Typically, CGPA is on a scale of 0 to 4, with 4 being the highest possible score.
    • Placement:

      • Definition: Binary variable indicating whether a student secured a placement (1) or not (0).
      • Data Type: Categorical (Binary)
      • Values: 1 (Placement secured) or 0 (No placement).

    These columns collectively provide a comprehensive snapshot of a student's intellectual abilities, academic performance, and their success in securing a placement. Analyzing this dataset can offer valuable insights into the dynamics of college placements and inform strategies for optimizing student outcomes.

  9. S

    Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...

    • scidb.cn
    • observatorio-cientifico.ua.es
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Márquez-Carpintero; Sergio Suescun-Ferrandiz; Carolina Lorenzo Álvarez; Jorge Fernandez-Herrero; Diego Viejo; Rosabel Roig-Vila; Miguel Cazorla (2024). DIPSEER: A Dataset for In-Person Student Emotion and Engagement Recognition in the Wild [Dataset]. http://doi.org/10.57760/sciencedb.11541
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Luis Márquez-Carpintero; Sergio Suescun-Ferrandiz; Carolina Lorenzo Álvarez; Jorge Fernandez-Herrero; Diego Viejo; Rosabel Roig-Vila; Miguel Cazorla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.

  10. d

    Replication Data for: Racial Social Norms among Brazilian Students: Academic...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Portella, Alysson; Kirschbaum, Charles (2023). Replication Data for: Racial Social Norms among Brazilian Students: Academic Performance, Social Status and Racial Identification [Dataset]. http://doi.org/10.7910/DVN/ZHCTCK
    Explore at:
    Dataset updated
    Nov 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Portella, Alysson; Kirschbaum, Charles
    Description

    This dataset contains files for the replication of "Racial Social Norms among Brazilian Students: Academic Performance, Social Status and Racial Identification". Data comes from the project Attitudes and Relationships among Primary and High School Students. The project interviewed more than 4 thousand students in five Brazilian public schools. It contains information about the students, their beliefs, and friendship ties between them. Paper Abstract: Studies in the United States show that minority students might face a trade-off between better academic performance and peer acceptance, the so-called ``acting white''. This paper investigates the relationship between grades and social status in five Brazilian schools and how it differs between racial groups. Social status is measured using friendship ties among students, assigning higher status to students more central in the network. The racial composition of friendship ties is diverse, although friendships tends to favor racial peers, especially for black students. We find a positive correlation between grades and social status of nonwhite students that is driven by their status among their white classmates. This differs from the pattern observed in the US, where a negative correlation between minorities' grades and their status among racial peers is not compensated by their status among white students. We also investigate how academic performance is associated with racial identity choice conditional on skin color, finding a weak negative relationship between higher grades and the odds of classification as mixed-race.

  11. Z

    Assessing the impact of hints in learning formal specification: Research...

    • data.niaid.nih.gov
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margolis, Iara (2024). Assessing the impact of hints in learning formal specification: Research artifact [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10450608
    Explore at:
    Dataset updated
    Jan 29, 2024
    Dataset provided by
    Margolis, Iara
    Macedo, Nuno
    Campos, José Creissac
    Cunha, Alcino
    Sousa, Emanuel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.

    Dataset

    The artifact contains the resources described below.

    Experiment resources

    The resources needed for replicating the experiment, namely in directory experiment:

    alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.

    alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.

    docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.

    api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.

    Experiment data

    The task database used in our application of the experiment, namely in directory data/experiment:

    Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.

    identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.

    Collected data

    Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:

    data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).

    data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:

    participant identification: participant's unique identifier (ID);

    socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).

    data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:

    participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);

    detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.

    data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:

    participant identification: participant's unique identifier (ID);

    user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).

    participants.txt: the list of participant identifiers that have registered for the experiment.

    Analysis scripts

    The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:

    analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.

    requirements.r: An R script to install the required libraries for the analysis script.

    normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.

    normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.

    Dockerfile: Docker script to automate the analysis script from the collected data.

    Setup

    To replicate the experiment and the analysis of the results, only Docker is required.

    If you wish to manually replicate the experiment and collect your own data, you'll need to install:

    A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.

    If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:

    Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.

    R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.

    Usage

    Experiment replication

    This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.

    To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.

    cd experimentdocker-compose up

    This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.

    In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:

    Group N (no hints): http://localhost:3000/0CAN

    Group L (error locations): http://localhost:3000/CA0L

    Group E (counter-example): http://localhost:3000/350E

    Group D (error description): http://localhost:3000/27AD

    In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.

    Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.

    Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.

    After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:

    Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.

    Analysis of other applications of the experiment

    This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.

    The analysis script expects data in 4 CSV files,

  12. d

    Pre-experiment with data on a serious game

    • search.dataone.org
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soto, Manuel (2023). Pre-experiment with data on a serious game [Dataset]. http://doi.org/10.7910/DVN/PYSDCN
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Soto, Manuel
    Description

    The files presented consist of a set of 119 students who were administered a serious game to identify whether their preferences for experiential learning influence their self-efficacy, motivation for learning and self-perceived academic performance. The R code used for data analysis is also included.

  13. w

    Showing Life Opportunities 2020-2021, Data from Experiment 3: Coastal...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David McKenzie (2024). Showing Life Opportunities 2020-2021, Data from Experiment 3: Coastal Educational Regime (Régimen Costa) - Ecuador [Dataset]. https://microdata.worldbank.org/index.php/catalog/6111
    Explore at:
    Dataset updated
    Jan 8, 2024
    Dataset provided by
    Mona Mensmann
    David McKenzie
    Francisco Flores
    Thomas Astebro
    Bruno Crepon
    Igor Asanov
    Guido Buenstorf
    Mathis Schulte
    Time period covered
    2020
    Area covered
    Ecuador
    Description

    Abstract

    Opportunity-focused, high-growth entrepreneurship and science-led innovation are crucial for continued economic growth and productivity. Working in these fields offers the opportunity for rewarding and high-paying careers. However, the majority of youth in developing countries do not consider either as job options, affecting their choices of what to study. Youth may not select these educational and career paths due to lack of knowledge, lack of appropriate skills, and lack of role models. We provide a scalable approach to overcoming these constraints through an online education course for secondary school students that covers entrepreneurial soft skills, scientific methods, and interviews with role models.

    The study comprises three experimental trials provided Before and during COVID-19 pandemic in different regions of Ecuador. This catalog entry includes data from Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021. The data from the other two experiments are also available in the catalog.

    Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021

    A randomized experiment conducted in high schools in Ecuador as rapid fire response to the hurdles of COVID-19 for the Coastal Educational regimes schools (Régimen Costa); Students finish the program in December 2020). The intervention is an online education course that covers entrepreneurial soft skills, scientific methods, and interviews with role models. This course is taken by students at home during the COVID-19 pandemic under teachers’ supervision. We work mostly with 14-22-year-old students (16,441 students) in 598 schools assigned to the program. We randomly assign schools either to treatment (and receiving the entrepreneurship courses online), or placebo-control (receiving a placebo treatment of online courses from standard curricula) groups. We also cross-randomize the role models and evaluate set of nimble interventions to increase take-up. The details of intervention can be found in AEA registry: Asanov, Igor and David McKenzie. 2021. Scaling up virtual learning of online learning in high schools. AEA RCT Registry. March 23 Merged datasets from the baseline, midline, endline survey for each experiment administrated through online learning platform in school during normal educational hours before COVID-19 pandemic or at student’s home during COVID-19 pandemic are documented here. The detailed information about the questioner and each item can be found in the codebooks (Baseline 1, Baseline 2, Midline, Endline 1, Endline 2) for corresponding experiments.

    Geographic coverage

    Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021 We cover students of last year of education in School K12 of technical specialization (Bachillerato técnico) that study in Coastal Educational Regime (Régimen Costa) 2020/2021, suppose to finish their education in school in March 2021 and we capable to register on the online platform. The schools in highlands educational regime covered in this experiment scatter over the next educational zones 1, 2, 3, 4, 5, 6, 7, 8, 9.
    Taken together in the experiment 2,3 we offered the program across all Ecuador to schools that have technical specialization track.

    Analysis unit

    Student

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    All students in selected schools who were present in classes filled out the baseline questionnaire

    Mode of data collection

    Internet [int]

    Research instrument

    Questionnaires We execute three main sets of questioners. A. Internet (Online Based survey)

    The survey consists of a multi-topic questionnaire administered to the students through online learning platform in school during normal educational hours before COVID-19 pandemic or at home during the COVID-19 pandemic. We collect next information: 1. Subject specific knowledge tests. Spanish, English, Statistics, Personal Initiative (only endline), Negotiations (only endline). 2. Career intentions, preferences, beliefs, expectations, and attitudes. STEM and entrepreneurial intentions, preferences, beliefs, expectations, and attitudes. 3. Psychological characteristics. Personal Initiative, Negotiations, General Cognitions (General Self-Efficacy, Youth Self-Efficacy, Perceived Subsidiary Self-Efficacy Scale, Self-Regulatory Focus, Short Grit Scale), Entrepreneurial Cognitions (Business Self-Efficacy, Identifying Opportunities, Business Attitudes, Social Entrepreneurship Standards). 4. Behavior in (incentivized) games: Other-regarding preferences (dictator game), tendency to cooperate (Prisoners Dilemma), Perseverance (triangle game), preference for honesty, creativity (unscramble game). 5. Other background information. Socioeconomic level, language spoken, risk and time preferences, trust level, parents background, big-five personality traits of student, cognitive abilities. Background information (5) collected only at the baseline. B. First follow-up Phone-based Survey Zone 2, Summer (Phone Based). The survey replicates by phone shorter version of the internet-based survey above. We collect next information: 1. Subject specific knowledge tests.
    2. Career intentions, preferences, beliefs, expectations, and attitudes. 3. Psychological characteristics

    C. (Second) Follow-up Phone-Based Survey, Winter, Zone 2, Highlands Educational Regime.

    We execute multi-topic questionnaire by phone to capture the first life-outcomes of students who finished the school. We collect next information:

    1. Life Outcome 1- Education. The set of questions that aims to measure the learning success, career/study intentions, propensity to plan and approach others with studying tasks, entrepreneurial intentions.
    2. Life Outcome 2- Labor. The set of questions that aims to measure employment status and income, job searching behavior, time devoted for working/business, salary expectations and knowledge about the careers, self-initiated contribution to the family.
    3. Personal Initiative/Negotiations related and other measures. The set of questions that aim to measure level of personal initiative, negotiation strategies, pregnancy rate, gender stereotypes, math/STEM self-efficacy, gender attitudes, parent-student communication effects.

    Cleaning operations

    Data Editing A. Internet, Online-based surveys. We extracted the raw data generated on online platform from each experiment and prepared it for research purposes. We made several pre-processing steps of data: 1. We transform the raw data generated on platform in standard statistical software (R/STATA) readable format. 2. We extracted the answer for each item for each student for each survey (Baseline, Midline, Endline). 3. We cleaned duplicated students and duplicated answers for each item in each survey based on administrative data, performance and information given by students on platform. 4. In case of baseline survey, we standardized items/scales but also kept the raw items.

    B. Phone-based surveys. The phone-based surveys are collected with help of advanced CATI kit. It contains all cases (attempts to call) and indication if the survey was effective. The data is cleaned to be ready for analysis. The data is anonymized but contains unique anonymous student id for merging across datasets.

  14. f

    Different science fair experiences of high school (HS) and post high school...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frederick Grinnell; Simon Dalley; Karen Shepherd; Joan Reisch (2023). Different science fair experiences of high school (HS) and post high school (PHS) students depending upon whether or not they received help from scientists. [Dataset]. http://doi.org/10.1371/journal.pone.0202320.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Frederick Grinnell; Simon Dalley; Karen Shepherd; Joan Reisch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Different science fair experiences of high school (HS) and post high school (PHS) students depending upon whether or not they received help from scientists.

  15. o

    Instructor-Student Relationship Experiment: Data Package

    • openicpsr.org
    stata
    Updated Apr 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carly D. Robinson; Whitney Scott; Michael A. Gottfried (2020). Instructor-Student Relationship Experiment: Data Package [Dataset]. http://doi.org/10.3886/E119071V2
    Explore at:
    stataAvailable download formats
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    Harvard University
    Authors
    Carly D. Robinson; Whitney Scott; Michael A. Gottfried
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These files contain the anonymized data and analysis files used to create the results found in "Taking It to the Next Level: A Field Experiment to Improve Instructor-Student Relationships in College." The paper can be found here: https://journals.sagepub.com/doi/10.1177/2332858419839707The paper's abstract: Competing in today’s workforce increasingly requires earning a college degree, yet almost half of all enrolled undergraduates do not graduate. As the costs of dropping out of college continue to rise, instructor-student relationships may be a critical yet underexplored avenue for improving college student outcomes. The present study attempts to replicate and extend a prior study that improved teacher-student relationships at the high school level in a college setting. In this registered report, we test whether an intervention that highlights instructor-student commonalities improves similarity, instructor-student relationships, academic achievement, and persistence for undergraduate students in a large, diverse public university. We found that the intervention increased perceptions of similarity but not downstream relational or academic outcomes. Our exploratory analyses provide one of the first investigations suggesting that instructor-student relationships predict an array of consequential student outcomes in college. These findings show a notable relationship gap: instructors perceived less positive relationships with certain student groups, but on average, students perceived equally positive relationships with their instructors.

  16. d

    Replication data for study: Understanding the Relation Between Study...

    • dataone.org
    • dataverse.azure.uit.no
    • +1more
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorås, Madeleine (2024). Replication data for study: Understanding the Relation Between Study Behaviors and Educational Design (Study 2) [Dataset]. http://doi.org/10.18710/7TUIJL
    Explore at:
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    DataverseNO
    Authors
    Lorås, Madeleine
    Description

    Some research has indicated that the relationship between students' study behavior and their academic performance is as strong as the relationship to more common predictors such as past performance and test scores. However, knowledge about students' study behavior, how behavior develops and is influenced by program and course design, and consequently, the effect various design parameters have on learning is limited. This data is part of a PhD project and relates to Study 2. This mixed-method study followed a population of computing students through their first year. Results from in-depth interviews with students throughout their first year found that the educational structure and organization of a study program conditions the students' study behavior. In order to further investigate these tendencies, two surveys (N=215) were conducted within the whole first-year student population at the beginning and end of the year. The dataset for this analysis is included in this repository. A significant difference found was in the use of surface and deep strategies at the beginning and end for the first year, indicating that students shift from deep to surface learning during the year. Even if students initially seek a deep content-driven approach to learning, the structure of the education and other organizational factors may be the cause of a more surface and task-focused approach towards the end of the first year. Students' study behavior is constrained by the educational design, which furthermore may lead to different learning outcomes than desired. Researching and developing learning goals, course content, lectures and assignments is one way to improve computing education; however, this research suggests that taking a comprehensive and integrated approach to educational design might also lead to improvements.

  17. m

    Data Related to the Rwandan Quality Basic Education for Human Capital...

    • data.mendeley.com
    Updated Oct 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pheneas Nkundabakura (2022). Data Related to the Rwandan Quality Basic Education for Human Capital Development Project: Teacher performance, attitude, and classroom observation [Dataset]. http://doi.org/10.17632/6xv7z8rh54.1
    Explore at:
    Dataset updated
    Oct 6, 2022
    Authors
    Pheneas Nkundabakura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rwanda
    Description

    The Rwanda Quality Basic Education for Human Capital Development (RQBEHCD) is a World Bank Group financed project through the government of Rwanda to support Mathematics and Science teachers from upper primary and lower secondary schools. The project was confirmed in 2019 and initiated in 2020. The dataset deposited here comprises three types of data; (1) teacher performance scores per subject taught [Math (for both primary and secondary school teachers), Physics, Chemistry, Biology, and Science and Elementary Technology (SET) taught in upper primary school], (2) teacher belief scores, and (3) classroom observation data. The data were collected before (June - July 2021) and after (June 2022) training on continuous profession development (CPD) comprised of ICT integration in teaching math and science, content knowledge (SCK), Math and Science laboratory activities, and innovative pedagogy. The data are collected from the first cohort (2021-2022) covering ten among 30 districts of Rwanda.

  18. w

    Showing Life Opportunities 2019-2020, Data from Experiment 1: Municipality...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Asanov (2024). Showing Life Opportunities 2019-2020, Data from Experiment 1: Municipality of Quito and Educational Zone 2 - Ecuador [Dataset]. https://microdata.worldbank.org/index.php/catalog/6110
    Explore at:
    Dataset updated
    Jan 8, 2024
    Dataset provided by
    Mona Mensmann
    David McKenzie
    Francisco Flores
    Thomas Astebro
    Bruno Crepon
    Igor Asanov
    Guido Buenstorf
    Mathis Schulte
    Time period covered
    2019 - 2021
    Area covered
    Ecuador
    Description

    Abstract

    Opportunity-focused, high-growth entrepreneurship and science-led innovation are crucial for continued economic growth and productivity. Working in these fields offers the opportunity for rewarding and high-paying careers. However, the majority of youth in developing countries do not consider either as job options, affecting their choices of what to study. Youth may not select these educational and career paths due to lack of knowledge, lack of appropriate skills, and lack of role models. We provide a scalable approach to overcoming these constraints through an online education course for secondary school students that covers entrepreneurial soft skills, scientific methods, and interviews with role models.

    The study comprises three experimental trials provided Before and during COVID-19 pandemic in different regions of Ecuador. This catalog entry includes data from Experiment 1: Educational Zone 2/Municipality of Quito 2019-2020. The data from the other two experiments are also available in the catalog.

    Experiment 1: Educational Zone 2/Municipality of Quito 2019-2020 In course of Showing Life Opportunities project we conducted a randomized control trial in high schools in Educational Zone 2, Ecuador and Municipality of Quito, Ecuador in 2019-2020; Students finish the program in July 2020. The intervention is an online education course that covers entrepreneurial soft skills, scientific methods, and interviews with role models. This course is taken by students at school (some students finish the program at school during COVID-19 outbreak). We work with mostly 14-19 year-old students (16,570 students). The experimental program covers 126 schools in Educational Zone 2 and 11 schools in Municipality of Quito. We randomly assign schools either to treatment (and receiving the entrepreneurship courses online), or placebo-control (receiving a placebo treatment of online courses from standard curricula) groups. We also cross-randomize the role models and evaluate set of nimble interventions to increase take-up.

    The details of intervention can be found in AEA registry: Asanov, Igor and David McKenzie. 2020. Showing Life Opportunities: Increasing opportunity-driven entrepreneurship and STEM careers through online courses in schools. AEA RCT Registry. July 19.

    Geographic coverage

    Experiment 1: Municipality of Quito and Educational Zone 2 Educational Zone 2 has its administrative headquarters in the city of Tena, Napo province. Its covers provinces of Napo, Orellana and Pichincha, 8 districts (15D01, 22D01, 17D10, 17D11, 15D02, 17D12, 22D02, 22D03), its 16 cantons and 68 parishes. It has an area of 39,542.58 km². The educational zone 2 spread from east to the western border of the Ecuador. We cover students of age 14-18 in schools that has sufficient access to the internet and classes of the K10, K11, or K12. We included the municipality of Quito in the study to enrich the coverage of program by having large (capital) city in the sample.

    Analysis unit

    Student

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    All students in selected schools who were present in classes filled out the baseline questionnaire

    Mode of data collection

    Internet [int]

    Research instrument

    Questionnaires We execute three main sets of questioners. A. Internet (Online Based survey)

    The survey consists of a multi-topic questionnaire administered to the students through online learning platform in school during normal educational hours before COVID-19 pandemic or at home during the COVID-19 pandemic. We collect next information: 1. Subject specific knowledge tests. Spanish, English, Statistics, Personal Initiative (only endline), Negotiations (only endline). 2. Career intentions, preferences, beliefs, expectations, and attitudes. STEM and entrepreneurial intentions, preferences, beliefs, expectations, and attitudes. 3. Psychological characteristics. Personal Initiative, Negotiations, General Cognitions (General Self-Efficacy, Youth Self-Efficacy, Perceived Subsidiary Self-Efficacy Scale, Self-Regulatory Focus, Short Grit Scale), Entrepreneurial Cognitions (Business Self-Efficacy, Identifying Opportunities, Business Attitudes, Social Entrepreneurship Standards). 4. Behavior in (incentivized) games: Other-regarding preferences (dictator game), tendency to cooperate (Prisoners Dilemma), Perseverance (triangle game), preference for honesty, creativity (unscramble game). 5. Other background information. Socioeconomic level, language spoken, risk and time preferences, trust level, parents background, big-five personality traits of student, cognitive abilities. Background information (5) collected only at the baseline. B. First follow-up Phone-based Survey Zone 2, Summer (Phone Based). The survey replicates by phone shorter version of the internet-based survey above. We collect next information: 1. Subject specific knowledge tests.
    2. Career intentions, preferences, beliefs, expectations, and attitudes. 3. Psychological characteristics

    C. (Second) Follow-up Phone-Based Survey, Winter, Zone 2, Highlands Educational Regime.

    We execute multi-topic questionnaire by phone to capture the first life-outcomes of students who finished the school. We collect next information:

    1. Life Outcome 1- Education. The set of questions that aims to measure the learning success, career/study intentions, propensity to plan and approach others with studying tasks, entrepreneurial intentions.
    2. Life Outcome 2- Labor. The set of questions that aims to measure employment status and income, job searching behavior, time devoted for working/business, salary expectations and knowledge about the careers, self-initiated contribution to the family.
    3. Personal Initiative/Negotiations related and other measures. The set of questions that aim to measure level of personal initiative, negotiation strategies, pregnancy rate, gender stereotypes, math/STEM self-efficacy, gender attitudes, parent-student communication effects.

    Cleaning operations

    Data Editing A. Internet, Online-based surveys. We extracted the raw data generated on online platform from each experiment and prepared it for research purposes. We made several pre-processing steps of data: 1. We transform the raw data generated on platform in standard statistical software (R/STATA) readable format. 2. We extracted the answer for each item for each student for each survey (Baseline, Midline, Endline). 3. We cleaned duplicated students and duplicated answers for each item in each survey based on administrative data, performance and information given by students on platform. 4. In case of baseline survey, we standardized items/scales but also kept the raw items.

    B. Phone-based surveys. The phone-based surveys are collected with help of advanced CATI kit. It contains all cases (attempts to call) and indication if the survey was effective. The data is cleaned to be ready for analysis. The data is anonymized but contains unique anonymous student id for merging across datasets.

  19. Description of the three clusters using KMeans.

    • plos.figshare.com
    xls
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laia Subirats; Aina Palacios Corral; Sof´ıa Pérez-Ruiz; Santi Fort; Go´mez-Mon˜ivas Sacha (2023). Description of the three clusters using KMeans. [Dataset]. http://doi.org/10.1371/journal.pone.0282306.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Laia Subirats; Aina Palacios Corral; Sof´ıa Pérez-Ruiz; Santi Fort; Go´mez-Mon˜ivas Sacha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean is used for the dataset attributes for each cluster.

  20. College Student Placement Factors Dataset

    • kaggle.com
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Islam007
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📘 College Student Placement Dataset

    A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

    📄 Dataset Description

    This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

    The dataset is ideal for:

    • Predictive modeling of placement outcomes
    • Educational exercises in classification
    • Feature importance analysis
    • End-to-end machine learning projects

    📊 Columns Description

    Column NameDescription
    College_IDUnique ID of the college (e.g., CLG0001 to CLG0100)
    IQStudent’s IQ score (normally distributed around 100)
    Prev_Sem_ResultGPA from the previous semester (range: 5.0 to 10.0)
    CGPACumulative Grade Point Average (range: ~5.0 to 10.0)
    Academic_PerformanceAnnual academic rating (scale: 1 to 10)
    Internship_ExperienceWhether the student has completed any internship (Yes/No)
    Extra_Curricular_ScoreInvolvement in extracurriculars (score from 0 to 10)
    Communication_SkillsSoft skill rating (scale: 1 to 10)
    Projects_CompletedNumber of academic/technical projects completed (0 to 5)
    PlacementFinal placement result (Yes = Placed, No = Not Placed)

    🎯 Target Variable

    • Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

    🧠 Use Cases

    • 📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
    • 🔍 Exploratory Data Analysis (EDA)
    • 🎯 Feature Engineering and Selection
    • 🧪 Model Evaluation Practice
    • 👩‍🏫 Academic Projects & Capstone Use

    📦 Dataset Size

    • Rows: 10,000
    • Columns: 10
    • File Format: .csv

    📚 Context

    This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

    📜 License

    MIT

    🔗 Source

    Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Organization logo

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Search
Clear search
Close search
Google apps
Main menu