100+ datasets found
  1. Students Data Analysis

    • kaggle.com
    zip
    Updated Jul 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MOMONO (2022). Students Data Analysis [Dataset]. https://www.kaggle.com/datasets/erqizhou/students-data-analysis
    Explore at:
    zip(2174 bytes)Available download formats
    Dataset updated
    Jul 20, 2022
    Authors
    MOMONO
    Description

    A little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.

    Goals

    You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.

    Tips

    1. Educational data may have subtle structures, hierarchies and heterogeneity are probably involved. Simple regressions can hardly make any difference. Also, you should keep an eye on the collinearity in some indicators collected by teachers who have already forgot statistics.
    2. Not all students are free to choose to apply for a graduate school, but some were born with privileges.
    3. Some of the students are trying (or planning to try) to apply for a graduate school for years, you should be responsible to give advice accurately under their circumstances

    About the Data

    Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float

    Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......

    Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty

    The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad

  2. Student Performance

    • kaggle.com
    zip
    Updated Oct 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2022). Student Performance [Dataset]. https://www.kaggle.com/datasets/whenamancodes/student-performance
    Explore at:
    zip(106753 bytes)Available download formats
    Dataset updated
    Oct 7, 2022
    Authors
    Aman Chauhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

    Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

    ColumnsDescription
    schoolstudent's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
    sexstudent's sex (binary: 'F' - female or 'M' - male)
    agestudent's age (numeric: from 15 to 22)
    addressstudent's home address type (binary: 'U' - urban or 'R' - rural)
    famsizefamily size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
    Pstatusparent's cohabitation status (binary: 'T' - living together or 'A' - apart)
    Medumother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    Fedufather's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
    Mjobmother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    Fjobfather's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
    reasonreason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
    guardianstudent's guardian (nominal: 'mother', 'father' or 'other')
    traveltimehome to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
    studytimeweekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
    failuresnumber of past class failures (numeric: n if 1<=n<3, else 4)
    schoolsupextra educational support (binary: yes or no)
    famsupfamily educational support (binary: yes or no)
    paidextra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
    activitiesextra-curricular activities (binary: yes or no)
    nurseryattended nursery school (binary: yes or no)
    higherwants to take higher education (binary: yes or no)
    internetInternet access at home (binary: yes or no)
    romanticwith a romantic relationship (binary: yes or no)
    famrelquality of family relationships (numeric: from 1 - very bad to 5 - excellent)
    freetimefree time after school (numeric: from 1 - very low to 5 - very high)
    gooutgoing out with friends (numeric: from 1 - very low to 5 - very high)
    Dalcworkday alcohol consumption (numeric: from 1 - very low to 5 - very high)
    Walcweekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
    healthcurrent health status (numeric: from 1 - very bad to 5 - very good)
    absencesnumber of school absences (numeric: from 0 to 93)

    These grades are related with the course subject, Math or Portuguese:

    GradeDescription
    G1first period grade (numeric: from 0 to 20)
    G2second period grade (numeric: from 0 to 20)
    G3final grade (numeric: from 0 to 20, output target)

    More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

  3. Student Performance Dataset: Academic Insights 10K

    • kaggle.com
    zip
    Updated Dec 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadeem Majeed (2024). Student Performance Dataset: Academic Insights 10K [Dataset]. https://www.kaggle.com/datasets/nadeemajeedch/students-performance-10000-clean-data-eda
    Explore at:
    zip(129033 bytes)Available download formats
    Dataset updated
    Dec 1, 2024
    Authors
    Nadeem Majeed
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    The dataset includes: Roll Number: Represent the roll number of the student.

    Gender: Useful for analyzing performance differences between male and female students.

    Race/Ethnicity: Allows analysis of academic performance trends across different racial or ethnic groups.

    Parental Level of Education: Indicates the educational background of the student's family.

    Lunch: Shows whether students receive a free or reduced lunch, which is often a socioeconomic indicator.

    Test Preparation Course: This tells whether students completed a test prep course, which could impact their performance.

    Math Score: Provides a measure of each student’s performance in math, used to calculate averages or trends across various demographics. Science Score: Evaluates students' Science knowledge, which can be analyzed to assess overall scentific knowledge of the student.

    Reading Score: Measures performance in reading, allowing for insights into literacy and comprehension levels among students.

    Writing Score: Evaluates students' writing skills, which can be analyzed to assess overall literacy and expression.

    Total Score: Shows the total number achieved by the student out of 400.

    Grade: Gade achieved by the student. "A" grade if Total marks >= 320, "B" grade if Total marks >= 250, "C" grade if Total marks >= 200, "D" grade if Total marks >= 150 and Fail if <150.

  4. d

    2019 Public Data File - Students

    • catalog.data.gov
    • data.cityofnewyork.us
    • +2more
    Updated Nov 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). 2019 Public Data File - Students [Dataset]. https://catalog.data.gov/dataset/2019-public-data-file-students
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    To collect feedback on their learning environment from families, students and teachers. Aids in facilitating the understanding of families perceptions, students, and teachers regarding their school. School leaders use feedback from the survey to reflect and make improvements to schools and programs. Each year all parents, teachers and students in grades 6-12 take the NYC School Survey. The survey is aligned to the DOE's Framework for Great Schools. It is designed to collect important information about each school's ability to support student success.

  5. Fictional Student Performance Dataset

    • kaggle.com
    zip
    Updated Nov 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Bin Imran (2023). Fictional Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadbinimran/fictional-student-performance-dataset
    Explore at:
    zip(14161 bytes)Available download formats
    Dataset updated
    Nov 4, 2023
    Authors
    Muhammad Bin Imran
    Description

    Dataset Name: Fictional Student Performance Dataset

    Description: The "Fictional Student Performance Dataset" is a comprehensive collection of fictional student records designed for educational and analytical purposes. This dataset comprises 500 student profiles and their associated attributes, making it a valuable resource for exploring various aspects of student performance and data analysis.

    Attributes:

    StudentID: A unique identifier for each student, facilitating individual tracking and analysis. Name: The name of each student, ensuring the dataset's personalization. Age: The age of each student, providing demographic information. Gender: The gender of each student, offering insights into gender-based performance trends. Grade: A continuous variable representing the academic performance of students, which can be used for regression and prediction tasks. Attendance: A percentage value denoting the attendance rate of each student, enabling attendance-related analyses. FinalExamScore: A continuous variable indicating the final exam score achieved by each student, making it suitable for evaluation and prediction tasks. Use Cases:

    Educational Research: This dataset is ideal for educational institutions and researchers to analyze student performance and identify factors that influence academic outcomes. Machine Learning Practice: It is an excellent resource for data science enthusiasts and students looking to practice various machine learning techniques, such as regression, classification, and clustering. Predictive Modeling: The "Grade" and "FinalExamScore" attributes can be used to develop predictive models to forecast student performance. Gender-Based Analysis: Explore gender-based trends in student performance and attendance. Attendance Impact: Investigate the correlation between attendance and academic success. Disclaimer: Please note that this dataset is entirely fictional and created for educational and practice purposes. Any resemblance to real individuals or institutions is purely coincidental.

    Citation: If you use this dataset in your research or projects, kindly acknowledge its source as the "Fictional Student Performance Dataset"

    Data Generation: The dataset was generated using a combination of randomization and scripting to ensure that it does not contain any real or personally identifiable information.

    Feel free to explore and utilize this dataset for educational purposes, data analysis, or machine learning exercises. It is intended to foster learning and experimentation in data science.

  6. d

    School Attendance by Student Group and District, 2021-2022

    • catalog.data.gov
    • data.ct.gov
    • +2more
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2025). School Attendance by Student Group and District, 2021-2022 [Dataset]. https://catalog.data.gov/dataset/school-attendance-by-student-group-and-district-2021-2022
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    data.ct.gov
    Description

    This dataset includes the attendance rate for public school students PK-12 by student group and by district during the 2021-2022 school year. Student groups include: Students experiencing homelessness Students with disabilities Students who qualify for free/reduced lunch English learners All high needs students Non-high needs students Students by race/ethnicity (Hispanic/Latino of any race, Black or African American, White, All other races) Attendance rates are provided for each student group by district and for the state. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch. When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.

  7. High School Student Performance & Demographics

    • kaggle.com
    zip
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dillon Myrick (2023). High School Student Performance & Demographics [Dataset]. https://www.kaggle.com/datasets/dillonmyrick/high-school-student-performance-and-demographics
    Explore at:
    zip(24581 bytes)Available download formats
    Dataset updated
    Nov 10, 2023
    Authors
    Dillon Myrick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains student achievement data for two Portuguese high schools. The data was collected using school reports and questionnaires, and includes student grades, demographics, social, parent, and school-related features.

    Two datasets are provided regarding performance in two distinct subjects: Mathematics and Portuguese language. I have cleaned the original datasets so that they are easier to read and use.

    Attributes for both student_math_cleaned.csv (Math course) and student_portuguese_cleaned.csv (Portuguese language course) datasets:

    1. school - student's school (binary: "GP" - Gabriel Pereira or "MS" - Mousinho da Silveira)
    2. sex - student's sex (binary: "F" - female or "M" - male)
    3. age - student's age (numeric: from 15 to 22)
    4. address_type - student's home address type (binary: "Urban" or "Rural")
    5. family_size - family size (binary: "Less or equal to 3" or "Greater than 3")
    6. parent_status - parent's cohabitation status (binary: "Living together" or "Apart")
    7. mother_education - mother's education (ordinal: "none", "primary education (4th grade)", "5th to 9th grade", "secondary education" or "higher education")
    8. father_education - father's education (ordinal: "none", "primary education (4th grade)", "5th to 9th grade", "secondary education" or "higher education")
    9. mother_job - mother's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
    10. father_job - father's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
    11. reason - reason to choose this school (nominal: close to "home", school "reputation", "course" preference or "other")
    12. guardian - student's guardian (nominal: "mother", "father" or "other")
    13. travel_time - home to school travel time (ordinal: "<15 min.", "15 to 30 min.", "30 min. to 1 hour", or 4 - ">1 hour")
    14. study_time - weekly study time (ordinal: 1 - "<2 hours", "2 to 5 hours", "5 to 10 hours", or ">10 hours")
    15. class_failures - number of past class failures (numeric: n if 1<=n<3, else 4)
    16. school_support - extra educational support (binary: yes or no)
    17. family_support - family educational support (binary: yes or no)
    18. extra_paid_classes - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
    19. activities - extra-curricular activities (binary: yes or no)
    20. nursery - attended nursery school (binary: yes or no)
    21. higher_ed - wants to take higher education (binary: yes or no)
    22. internet - Internet access at home (binary: yes or no)
    23. romantic_relationship - with a romantic relationship (binary: yes or no)
    24. family_relationship - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
    25. free_time - free time after school (numeric: from 1 - very low to 5 - very high)
    26. social - going out with friends (numeric: from 1 - very low to 5 - very high)
    27. weekday_alcohol - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
    28. weekend_alcohol - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
    29. health - current health status (numeric: from 1 - very bad to 5 - very good)
    30. absences - number of school absences (numeric: from 0 to 93)

    These grades are related with the course subject, Math or Portuguese:

    1. grade_1 - first period grade (numeric: from 0 to 20)
    2. grade_2 - second period grade (numeric: from 0 to 20)
    3. final_grade - final grade (numeric: from 0 to 20, output target)

    Important note: the target attribute final_grade has a strong correlation with attributes grade_2 and grade_1. This occurs because final_grade is the final year grade (issued at the 3rd period), while grade_1 and grade_2 correspond to the 1st and 2nd period grades. It is more difficult to predict final_grade without grade_2 and grade_1, but these predictions will be much more useful.

    Additional note: there are 382 students that belong to both datasets, though the ID's do not match. These students can be identified by searching for identical attributes that characterize each student.

    Please include this citation if you plan to use this database: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.

  8. m

    Data from: Student grade prediction dataset

    • data.mendeley.com
    Updated Jun 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nonso Nnamoko (2022). Student grade prediction dataset [Dataset]. http://doi.org/10.17632/wf8568hxb7.1
    Explore at:
    Dataset updated
    Jun 16, 2022
    Authors
    Nonso Nnamoko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset provides a collection of 160 instances belonging to two classes (pass' = 136 andfail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON).

    This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.

  9. e

    Data on students' group project preferences

    • datarepository.eur.nl
    • dataverse.nl
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim M. Benning (2023). Data on students' group project preferences [Dataset]. http://doi.org/10.25397/eur.20342649.v1
    Explore at:
    Dataset updated
    May 30, 2023
    Dataset provided by
    Erasmus University Rotterdam (EUR)
    Authors
    Tim M. Benning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data files contain information about the preferences of bachelor 1 and 2 students obtained via a discrete choice experiment (12 choice tasks per respondent), demographic characteristics of the sample and population, experiences with free-riding, attitude towards teamwork, and a measure of individualism/collectivism. Students were presented a different grade weight before each choice task (i.e., 10%, 30%, or 100%). The data was collected from mid-June to mid-July 2021.

    Access to the data is subject to the approval of a data sharing agreement due to the personal information contained in the dataset.

    A summary of the publication can be found below: Reducing free-riding is an important challenge for educators who use group projects. In this study, we measure students’ preferences for group project characteristics and investigate if characteristics that better help to reduce free-riding become more important for students when stakes increase. We used a discrete choice experiment based on twelve choice tasks in which students chose between two group projects that differed on five characteristics of which each level had its own effect on free-riding. A different group project grade weight was presented before each choice task to manipulate how much there was at stake for students in the group project. Data of 257 student respondents were used in the analysis. Based on random parameter logit model estimates we find that students prefer (in order of importance) assignment based on schedule availability and motivation or self-selection (instead of random assignment), the use of one or two peer process evaluations (instead of zero), a small team size of three or two students (instead of four), a common grade (instead of a divided grade), and a discussion with the course coordinator without a sanction as a method to handle free-riding (instead of member expulsion). Furthermore, we find that the characteristic team formation approach becomes even more important (especially self-selection) when student stakes increase. Educators can use our findings to design group projects that better help to reduce free-riding by (1) avoiding random assignment as team formation approach, (2) using (one or two) peer process evaluations, and (3) creating small(er) teams.

  10. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  11. d

    First Generation College Students Experiences - Qualitative Dataset 2021

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Watts, Gavin (2023). First Generation College Students Experiences - Qualitative Dataset 2021 [Dataset]. http://doi.org/10.7910/DVN/YCXBNF
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Watts, Gavin
    Description

    The experiences of first-generation college students (FGCS) can guide the development of effective practices for supporting and retaining this population in higher education settings. Multiple themes emerged via qualitative interviews with ten FCGS participants, including: challenges/barriers within instruction/classroom communication, financial struggles, academic strategies, and perseverance/motivations related to family and academics. Findings show needs for clear communication/expectations within higher education settings, social supports/relationships outside of the campus settings, as well as acknowledgment and reinforcement for academic successes. Additionally, these findings align with previous research showing FGCS to be underprepared and under-supported in applying for, enrolling in, and paying for college.

  12. Student Study Performance

    • kaggle.com
    zip
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). Student Study Performance [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/student-study-performance
    Explore at:
    zip(8907 bytes)Available download formats
    Dataset updated
    Mar 7, 2024
    Authors
    Bhavik Jikadara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Problem Statement:

    This project understands how the student's performance (test scores) is affected by other variables such as Gender, Ethnicity, Parental level of education, Lunch and Test preparation course.

    Content

    This data set consists of the marks secured by the students in various subjects. - gender : sex of students -> (Male/female) - race/ethnicity : ethnicity of students -> (Group A, B,C, D,E) - parental level of education : parents' final education ->(bachelor's degree,some college,master's degree,associate's degree,- high school) - lunch : having lunch before test (standard or free/reduced) - test preparation course : complete or not complete before test - math score - reading score - writing score

    Inspiration:

    To understand the influence of the parent's background, test preparation etc on students' performance

  13. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  14. o

    Data from: Universal Access to Free School Meals and Student Achievement:...

    • openicpsr.org
    Updated Nov 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krista Ruffini (2020). Universal Access to Free School Meals and Student Achievement: Evidence from the Community Eligibility Provision [Dataset]. http://doi.org/10.3886/E127581V1
    Explore at:
    Dataset updated
    Nov 28, 2020
    Authors
    Krista Ruffini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication materials for Ruffini, Krista. "Universal Access to Free School Meals and Student Achievement: Evidence from the Community Eligibility Provision." Journal of Human Resources. A previously published version of this project contained 0 byte files. Please reference the latest version of the project to access the most current data.

  15. Z

    Dataset for Paper "Towards Increased Diversity in STEM Education: Five...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4737551
    Explore at:
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Anonymous
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort" - Rev #1

    This is the dataset for the paper titled "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort".

    In case of questions, feel free to contact the authors, anonymised, ORCID: https://orcid.org/*anonymised*, current affiliation and email: anonymised

    Survey 2019

    The raw survey data for the initial 2019 survey is available in the file survey2019_anon.csv. Note that the data is anonymised as free-text comments have been removed. Explanations on the variables and their levels are given in the files variables_survey2019.csv and values_survey2019.csv. The questionnaire for the 2019 survey is contained in survey2019_instrument.pdf.

    Survey 2020

    The raw survey data for the 2020 survey is available in the file rdata_anon_survey2020.csv. Additional scripts are supplied to reproduce the exploratory factor analysis. The main entry is the file EFA.R, which imports the data. The file contains some comments on the process. The questionnaire for the 2020 survey is contained in survey2020_instrument.pdf.

    Interviews

    The interview guide used for the five interviews is available in the file interview_instrument.pdf.

  16. Predict students' dropout and academic success

    • kaggle.com
    zip
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
    Explore at:
    zip(89332 bytes)Available download formats
    Dataset updated
    Jan 3, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Predict students' dropout and academic success

    Investigating the Impact of Social and Economic Factors

    By [source]

    About this dataset

    This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

    Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

    In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

    Research Ideas

    • Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.
    • Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.
    • Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...
  17. A level and other 16 to 18 results - Retention - student characteristics

    • explore-education-statistics.service.gov.uk
    Updated Mar 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2025). A level and other 16 to 18 results - Retention - student characteristics [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/a953f076-29c6-409f-b952-10fb63b86717
    Explore at:
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Retention metrics by student characteristic. Student characteristics include sex, ethnicity, disadvantage status, free school meal provision, first language, special educational needs (SEN) provision, and KS4 prior attainment.

  18. o

    Data from: Racial Economic Segregation across U.S. Public Schools, 1991-2022...

    • openicpsr.org
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heewon Jang (2024). Racial Economic Segregation across U.S. Public Schools, 1991-2022 [Dataset]. http://doi.org/10.3886/E207521V2
    Explore at:
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    University of Alabama
    Authors
    Heewon Jang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1991 - 2022
    Area covered
    United States
    Description

    This data archive shares publicly available datasets and syntax files used to produce results in the paper "Racial Economic Segregation across U.S. Public Schools, 1991-2022, " where I describes trends in racial economic segregation over the last three decades and decomposes these trends into different geographic scales (e.g., between-state, between-district, and within-district segregation). In doing so, I use the Longitudinal Imputed Student Dataset, a newly released dataset that imputes low-quality free lunch eligibility enrollment data in the Common Core of Data.

  19. Dataset: Fitbits, field-tests, and grades. The effects of a healthy and...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allie Broaddus; Brandon Jaquis; Colt Jones; Scarlet Jost; Andrew Lang; Ailin Li; Qiwen Li; Philip Nelson; Esther Spear (2023). Dataset: Fitbits, field-tests, and grades. The effects of a healthy and physically active lifestyle on the academic performance of first year college students. [Dataset]. http://doi.org/10.6084/m9.figshare.7218497.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Allie Broaddus; Brandon Jaquis; Colt Jones; Scarlet Jost; Andrew Lang; Ailin Li; Qiwen Li; Philip Nelson; Esther Spear
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The data were collected during the Fall semester of 2017 from 581 freshmen enrolled at Oral Roberts University in a class entitled “Introduction to Whole Person Education” which has a required health and physical exercise component consisting of: Steps and Active Minutes goals, a 1-mile field test, and a lifestyle assessment survey. Students utilize a Fitbit to help keep track of their steps and active minutes which are synced with the course gradebook. The student’s semester grade point average was added once the semester was complete. As the grades were retrieved and stored, the dataset was de-identified to ensure confidentiality.

  20. A level and other 16 to 18 results - Student counts and Results - A level by...

    • explore-education-statistics.service.gov.uk
    Updated Apr 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2024). A level and other 16 to 18 results - Student counts and Results - A level by subject and student characteristics (end of 16-18 study) [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/1e9c2d22-aa6e-4af8-b070-b937d2937b5e
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A level student counts by grade achieved, subject and student characteristics.Student characteristics include gender, ethnicity, disadvantage status, free school meal provision, first language, special educational needs (SEN) provision, and KS4 prior attainment.Includes students triggered for inclusion in performance tables who completed A levels during 16-18 study, after discounting of exams.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MOMONO (2022). Students Data Analysis [Dataset]. https://www.kaggle.com/datasets/erqizhou/students-data-analysis
Organization logo

Students Data Analysis

For student group structures and predictions, this is only fictional

Explore at:
zip(2174 bytes)Available download formats
Dataset updated
Jul 20, 2022
Authors
MOMONO
Description

A little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.

Goals

You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.

Tips

  1. Educational data may have subtle structures, hierarchies and heterogeneity are probably involved. Simple regressions can hardly make any difference. Also, you should keep an eye on the collinearity in some indicators collected by teachers who have already forgot statistics.
  2. Not all students are free to choose to apply for a graduate school, but some were born with privileges.
  3. Some of the students are trying (or planning to try) to apply for a graduate school for years, you should be responsible to give advice accurately under their circumstances

About the Data

Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float

Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......

Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty

The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad

Search
Clear search
Close search
Google apps
Main menu