100+ datasets found

Students Data Analysis
kaggle.com
zip
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MOMONO (2022). Students Data Analysis [Dataset]. https://www.kaggle.com/datasets/erqizhou/students-data-analysis
Explore at:
zip(2174 bytes)Available download formats
Dataset updated
Jul 20, 2022
Authors
MOMONO
Description
A little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.

Goals

You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.

Tips

Educational data may have subtle structures, hierarchies and heterogeneity are probably involved. Simple regressions can hardly make any difference. Also, you should keep an eye on the collinearity in some indicators collected by teachers who have already forgot statistics.

Not all students are free to choose to apply for a graduate school, but some were born with privileges.

Some of the students are trying (or planning to try) to apply for a graduate school for years, you should be responsible to give advice accurately under their circumstances

About the Data

Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float

Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......

Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty

The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad

Student Performance

kaggle.com

zip

Updated Oct 7, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2022). Student Performance [Dataset]. https://www.kaggle.com/datasets/whenamancodes/student-performance

Explore at:

zip(106753 bytes)Available download formats

Dataset updated

Oct 7, 2022

Authors

Aman Chauhan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

Columns	Description
school	student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
sex	student's sex (binary: 'F' - female or 'M' - male)
age	student's age (numeric: from 15 to 22)
address	student's home address type (binary: 'U' - urban or 'R' - rural)
famsize	family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
Pstatus	parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
Medu	mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Fedu	father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Mjob	mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
Fjob	father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
reason	reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
guardian	student's guardian (nominal: 'mother', 'father' or 'other')
traveltime	home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
studytime	weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
failures	number of past class failures (numeric: n if 1<=n<3, else 4)
schoolsup	extra educational support (binary: yes or no)
famsup	family educational support (binary: yes or no)
paid	extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
activities	extra-curricular activities (binary: yes or no)
nursery	attended nursery school (binary: yes or no)
higher	wants to take higher education (binary: yes or no)
internet	Internet access at home (binary: yes or no)
romantic	with a romantic relationship (binary: yes or no)
famrel	quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime	free time after school (numeric: from 1 - very low to 5 - very high)
goout	going out with friends (numeric: from 1 - very low to 5 - very high)
Dalc	workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
Walc	weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
health	current health status (numeric: from 1 - very bad to 5 - very good)
absences	number of school absences (numeric: from 0 to 93)

These grades are related with the course subject, Math or Portuguese:

Grade	Description
G1	first period grade (numeric: from 0 to 20)
G2	second period grade (numeric: from 0 to 20)
G3	final grade (numeric: from 0 to 20, output target)

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

Student Performance Dataset: Academic Insights 10K
kaggle.com
zip
Updated Dec 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nadeem Majeed (2024). Student Performance Dataset: Academic Insights 10K [Dataset]. https://www.kaggle.com/datasets/nadeemajeedch/students-performance-10000-clean-data-eda
Explore at:
zip(129033 bytes)Available download formats
Dataset updated
Dec 1, 2024
Authors
Nadeem Majeed
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
The dataset includes: Roll Number: Represent the roll number of the student.

Gender: Useful for analyzing performance differences between male and female students.

Race/Ethnicity: Allows analysis of academic performance trends across different racial or ethnic groups.

Parental Level of Education: Indicates the educational background of the student's family.

Lunch: Shows whether students receive a free or reduced lunch, which is often a socioeconomic indicator.

Test Preparation Course: This tells whether students completed a test prep course, which could impact their performance.

Math Score: Provides a measure of each student’s performance in math, used to calculate averages or trends across various demographics. Science Score: Evaluates students' Science knowledge, which can be analyzed to assess overall scentific knowledge of the student.

Reading Score: Measures performance in reading, allowing for insights into literacy and comprehension levels among students.

Writing Score: Evaluates students' writing skills, which can be analyzed to assess overall literacy and expression.

Total Score: Shows the total number achieved by the student out of 400.

Grade: Gade achieved by the student. "A" grade if Total marks >= 320, "B" grade if Total marks >= 250, "C" grade if Total marks >= 200, "D" grade if Total marks >= 150 and Fail if <150.
d
2019 Public Data File - Students
catalog.data.gov
data.cityofnewyork.us
+2more
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). 2019 Public Data File - Students [Dataset]. https://catalog.data.gov/dataset/2019-public-data-file-students
Explore at:
Dataset updated
Nov 29, 2024
Dataset provided by
data.cityofnewyork.us
Description
To collect feedback on their learning environment from families, students and teachers. Aids in facilitating the understanding of families perceptions, students, and teachers regarding their school. School leaders use feedback from the survey to reflect and make improvements to schools and programs. Each year all parents, teachers and students in grades 6-12 take the NYC School Survey. The survey is aligned to the DOE's Framework for Great Schools. It is designed to collect important information about each school's ability to support student success.
Fictional Student Performance Dataset
kaggle.com
zip
Updated Nov 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Bin Imran (2023). Fictional Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadbinimran/fictional-student-performance-dataset
Explore at:
zip(14161 bytes)Available download formats
Dataset updated
Nov 4, 2023
Authors
Muhammad Bin Imran
Description
Dataset Name: Fictional Student Performance Dataset

Description: The "Fictional Student Performance Dataset" is a comprehensive collection of fictional student records designed for educational and analytical purposes. This dataset comprises 500 student profiles and their associated attributes, making it a valuable resource for exploring various aspects of student performance and data analysis.

Attributes:

StudentID: A unique identifier for each student, facilitating individual tracking and analysis. Name: The name of each student, ensuring the dataset's personalization. Age: The age of each student, providing demographic information. Gender: The gender of each student, offering insights into gender-based performance trends. Grade: A continuous variable representing the academic performance of students, which can be used for regression and prediction tasks. Attendance: A percentage value denoting the attendance rate of each student, enabling attendance-related analyses. FinalExamScore: A continuous variable indicating the final exam score achieved by each student, making it suitable for evaluation and prediction tasks. Use Cases:

Educational Research: This dataset is ideal for educational institutions and researchers to analyze student performance and identify factors that influence academic outcomes. Machine Learning Practice: It is an excellent resource for data science enthusiasts and students looking to practice various machine learning techniques, such as regression, classification, and clustering. Predictive Modeling: The "Grade" and "FinalExamScore" attributes can be used to develop predictive models to forecast student performance. Gender-Based Analysis: Explore gender-based trends in student performance and attendance. Attendance Impact: Investigate the correlation between attendance and academic success. Disclaimer: Please note that this dataset is entirely fictional and created for educational and practice purposes. Any resemblance to real individuals or institutions is purely coincidental.

Citation: If you use this dataset in your research or projects, kindly acknowledge its source as the "Fictional Student Performance Dataset"

Data Generation: The dataset was generated using a combination of randomization and scripting to ensure that it does not contain any real or personally identifiable information.

Feel free to explore and utilize this dataset for educational purposes, data analysis, or machine learning exercises. It is intended to foster learning and experimentation in data science.
d
School Attendance by Student Group and District, 2021-2022
catalog.data.gov
data.ct.gov
+2more
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2025). School Attendance by Student Group and District, 2021-2022 [Dataset]. https://catalog.data.gov/dataset/school-attendance-by-student-group-and-district-2021-2022
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.ct.gov
Description
This dataset includes the attendance rate for public school students PK-12 by student group and by district during the 2021-2022 school year. Student groups include: Students experiencing homelessness Students with disabilities Students who qualify for free/reduced lunch English learners All high needs students Non-high needs students Students by race/ethnicity (Hispanic/Latino of any race, Black or African American, White, All other races) Attendance rates are provided for each student group by district and for the state. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch. When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.
High School Student Performance & Demographics
kaggle.com
zip
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dillon Myrick (2023). High School Student Performance & Demographics [Dataset]. https://www.kaggle.com/datasets/dillonmyrick/high-school-student-performance-and-demographics
Explore at:
zip(24581 bytes)Available download formats
Dataset updated
Nov 10, 2023
Authors
Dillon Myrick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains student achievement data for two Portuguese high schools. The data was collected using school reports and questionnaires, and includes student grades, demographics, social, parent, and school-related features.

Two datasets are provided regarding performance in two distinct subjects: Mathematics and Portuguese language. I have cleaned the original datasets so that they are easier to read and use.

Attributes for both student_math_cleaned.csv (Math course) and student_portuguese_cleaned.csv (Portuguese language course) datasets:

school - student's school (binary: "GP" - Gabriel Pereira or "MS" - Mousinho da Silveira)

sex - student's sex (binary: "F" - female or "M" - male)

age - student's age (numeric: from 15 to 22)

address_type - student's home address type (binary: "Urban" or "Rural")

family_size - family size (binary: "Less or equal to 3" or "Greater than 3")

parent_status - parent's cohabitation status (binary: "Living together" or "Apart")

mother_education - mother's education (ordinal: "none", "primary education (4th grade)", "5th to 9th grade", "secondary education" or "higher education")

father_education - father's education (ordinal: "none", "primary education (4th grade)", "5th to 9th grade", "secondary education" or "higher education")

mother_job - mother's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")

father_job - father's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")

reason - reason to choose this school (nominal: close to "home", school "reputation", "course" preference or "other")

guardian - student's guardian (nominal: "mother", "father" or "other")

travel_time - home to school travel time (ordinal: "<15 min.", "15 to 30 min.", "30 min. to 1 hour", or 4 - ">1 hour")

study_time - weekly study time (ordinal: 1 - "<2 hours", "2 to 5 hours", "5 to 10 hours", or ">10 hours")

class_failures - number of past class failures (numeric: n if 1<=n<3, else 4)

school_support - extra educational support (binary: yes or no)

family_support - family educational support (binary: yes or no)

extra_paid_classes - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)

activities - extra-curricular activities (binary: yes or no)

nursery - attended nursery school (binary: yes or no)

higher_ed - wants to take higher education (binary: yes or no)

internet - Internet access at home (binary: yes or no)

romantic_relationship - with a romantic relationship (binary: yes or no)

family_relationship - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

free_time - free time after school (numeric: from 1 - very low to 5 - very high)

social - going out with friends (numeric: from 1 - very low to 5 - very high)

weekday_alcohol - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

weekend_alcohol - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

health - current health status (numeric: from 1 - very bad to 5 - very good)

absences - number of school absences (numeric: from 0 to 93)

These grades are related with the course subject, Math or Portuguese:

grade_1 - first period grade (numeric: from 0 to 20)

grade_2 - second period grade (numeric: from 0 to 20)

final_grade - final grade (numeric: from 0 to 20, output target)

Important note: the target attribute final_grade has a strong correlation with attributes grade_2 and grade_1. This occurs because final_grade is the final year grade (issued at the 3rd period), while grade_1 and grade_2 correspond to the 1st and 2nd period grades. It is more difficult to predict final_grade without grade_2 and grade_1, but these predictions will be much more useful.

Additional note: there are 382 students that belong to both datasets, though the ID's do not match. These students can be identified by searching for identical attributes that characterize each student.

Please include this citation if you plan to use this database: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.
m
Data from: Student grade prediction dataset
data.mendeley.com
Updated Jun 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nonso Nnamoko (2022). Student grade prediction dataset [Dataset]. http://doi.org/10.17632/wf8568hxb7.1
Explore at:
Unique identifier
https://doi.org/10.17632/wf8568hxb7.1
Dataset updated
Jun 16, 2022
Authors
Nonso Nnamoko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset provides a collection of 160 instances belonging to two classes (pass' = 136 andfail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON).

This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.
e
Data on students' group project preferences
datarepository.eur.nl
dataverse.nl
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim M. Benning (2023). Data on students' group project preferences [Dataset]. http://doi.org/10.25397/eur.20342649.v1
Explore at:
Unique identifier
https://doi.org/10.25397/eur.20342649.v1
Dataset updated
May 30, 2023
Dataset provided by
Erasmus University Rotterdam (EUR)
Authors
Tim M. Benning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data files contain information about the preferences of bachelor 1 and 2 students obtained via a discrete choice experiment (12 choice tasks per respondent), demographic characteristics of the sample and population, experiences with free-riding, attitude towards teamwork, and a measure of individualism/collectivism. Students were presented a different grade weight before each choice task (i.e., 10%, 30%, or 100%). The data was collected from mid-June to mid-July 2021.

Access to the data is subject to the approval of a data sharing agreement due to the personal information contained in the dataset.

A summary of the publication can be found below: Reducing free-riding is an important challenge for educators who use group projects. In this study, we measure students’ preferences for group project characteristics and investigate if characteristics that better help to reduce free-riding become more important for students when stakes increase. We used a discrete choice experiment based on twelve choice tasks in which students chose between two group projects that differed on five characteristics of which each level had its own effect on free-riding. A different group project grade weight was presented before each choice task to manipulate how much there was at stake for students in the group project. Data of 257 student respondents were used in the analysis. Based on random parameter logit model estimates we find that students prefer (in order of importance) assignment based on schedule availability and motivation or self-selection (instead of random assignment), the use of one or two peer process evaluations (instead of zero), a small team size of three or two students (instead of four), a common grade (instead of a divided grade), and a discussion with the course coordinator without a sanction as a method to handle free-riding (instead of member expulsion). Furthermore, we find that the characteristic team formation approach becomes even more important (especially self-selection) when student stakes increase. Educators can use our findings to design group projects that better help to reduce free-riding by (1) avoiding random assignment as team formation approach, (2) using (one or two) peer process evaluations, and (3) creating small(er) teams.
B
Residential School Locations Dataset (CSV Format)
borealisdata.ca
search.dataone.org
Updated Jun 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/RIYEMU
Dataset updated
Jun 5, 2019
Dataset provided by
Borealis
Authors
Rosa Orlandini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1863 - Jun 30, 1998
Area covered
Canada
Description
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
d
First Generation College Students Experiences - Qualitative Dataset 2021
search.dataone.org
dataverse.harvard.edu
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Watts, Gavin (2023). First Generation College Students Experiences - Qualitative Dataset 2021 [Dataset]. http://doi.org/10.7910/DVN/YCXBNF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/YCXBNF
Dataset updated
Nov 14, 2023
Dataset provided by
Harvard Dataverse
Authors
Watts, Gavin
Description
The experiences of first-generation college students (FGCS) can guide the development of effective practices for supporting and retaining this population in higher education settings. Multiple themes emerged via qualitative interviews with ten FCGS participants, including: challenges/barriers within instruction/classroom communication, financial struggles, academic strategies, and perseverance/motivations related to family and academics. Findings show needs for clear communication/expectations within higher education settings, social supports/relationships outside of the campus settings, as well as acknowledgment and reinforcement for academic successes. Additionally, these findings align with previous research showing FGCS to be underprepared and under-supported in applying for, enrolling in, and paying for college.
Student Study Performance
kaggle.com
zip
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). Student Study Performance [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/student-study-performance
Explore at:
zip(8907 bytes)Available download formats
Dataset updated
Mar 7, 2024
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Problem Statement:

This project understands how the student's performance (test scores) is affected by other variables such as Gender, Ethnicity, Parental level of education, Lunch and Test preparation course.

Content

This data set consists of the marks secured by the students in various subjects. - gender : sex of students -> (Male/female) - race/ethnicity : ethnicity of students -> (Group A, B,C, D,E) - parental level of education : parents' final education ->(bachelor's degree,some college,master's degree,associate's degree,- high school) - lunch : having lunch before test (standard or free/reduced) - test preparation course : complete or not complete before test - math score - reading score - writing score

Inspiration:

To understand the influence of the parent's background, test preparation etc on students' performance
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
o
Data from: Universal Access to Free School Meals and Student Achievement:...
openicpsr.org
Updated Nov 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krista Ruffini (2020). Universal Access to Free School Meals and Student Achievement: Evidence from the Community Eligibility Provision [Dataset]. http://doi.org/10.3886/E127581V1
Explore at:
Unique identifier
https://doi.org/10.3886/E127581V1
Dataset updated
Nov 28, 2020
Authors
Krista Ruffini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication materials for Ruffini, Krista. "Universal Access to Free School Meals and Student Achievement: Evidence from the Community Eligibility Provision." Journal of Human Resources. A previously published version of this project contained 0 byte files. Please reference the latest version of the project to access the most current data.
Z
Dataset for Paper "Towards Increased Diversity in STEM Education: Five...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2024). Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4737551
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Anonymous
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort" - Rev #1

This is the dataset for the paper titled "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort".

In case of questions, feel free to contact the authors, anonymised, ORCID: https://orcid.org/*anonymised*, current affiliation and email: anonymised

Survey 2019

The raw survey data for the initial 2019 survey is available in the file survey2019_anon.csv. Note that the data is anonymised as free-text comments have been removed. Explanations on the variables and their levels are given in the files variables_survey2019.csv and values_survey2019.csv. The questionnaire for the 2019 survey is contained in survey2019_instrument.pdf.

Survey 2020

The raw survey data for the 2020 survey is available in the file rdata_anon_survey2020.csv. Additional scripts are supplied to reproduce the exploratory factor analysis. The main entry is the file EFA.R, which imports the data. The file contains some comments on the process. The questionnaire for the 2020 survey is contained in survey2020_instrument.pdf.

Interviews

The interview guide used for the five interviews is available in the file interview_instrument.pdf.
Predict students' dropout and academic success
kaggle.com
zip
Updated Jan 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
Explore at:
zip(89332 bytes)Available download formats
Dataset updated
Jan 3, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Predict students' dropout and academic success

Investigating the Impact of Social and Economic Factors

By [source]

About this dataset

This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

Research Ideas

Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.

Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.

Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...
A level and other 16 to 18 results - Retention - student characteristics
explore-education-statistics.service.gov.uk
Updated Mar 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2025). A level and other 16 to 18 results - Retention - student characteristics [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/a953f076-29c6-409f-b952-10fb63b86717
Explore at:
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Retention metrics by student characteristic. Student characteristics include sex, ethnicity, disadvantage status, free school meal provision, first language, special educational needs (SEN) provision, and KS4 prior attainment.
o
Data from: Racial Economic Segregation across U.S. Public Schools, 1991-2022...
openicpsr.org
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heewon Jang (2024). Racial Economic Segregation across U.S. Public Schools, 1991-2022 [Dataset]. http://doi.org/10.3886/E207521V2
Explore at:
Unique identifier
https://doi.org/10.3886/E207521V2
Dataset updated
Jul 3, 2024
Dataset provided by
University of Alabama
Authors
Heewon Jang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1991 - 2022
Area covered
United States
Description
This data archive shares publicly available datasets and syntax files used to produce results in the paper "Racial Economic Segregation across U.S. Public Schools, 1991-2022, " where I describes trends in racial economic segregation over the last three decades and decomposes these trends into different geographic scales (e.g., between-state, between-district, and within-district segregation). In doing so, I use the Longitudinal Imputed Student Dataset, a newly released dataset that imputes low-quality free lunch eligibility enrollment data in the Common Core of Data.
Dataset: Fitbits, field-tests, and grades. The effects of a healthy and...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allie Broaddus; Brandon Jaquis; Colt Jones; Scarlet Jost; Andrew Lang; Ailin Li; Qiwen Li; Philip Nelson; Esther Spear (2023). Dataset: Fitbits, field-tests, and grades. The effects of a healthy and physically active lifestyle on the academic performance of first year college students. [Dataset]. http://doi.org/10.6084/m9.figshare.7218497.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7218497.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Allie Broaddus; Brandon Jaquis; Colt Jones; Scarlet Jost; Andrew Lang; Ailin Li; Qiwen Li; Philip Nelson; Esther Spear
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The data were collected during the Fall semester of 2017 from 581 freshmen enrolled at Oral Roberts University in a class entitled “Introduction to Whole Person Education” which has a required health and physical exercise component consisting of: Steps and Active Minutes goals, a 1-mile field test, and a lifestyle assessment survey. Students utilize a Fitbit to help keep track of their steps and active minutes which are synced with the course gradebook. The student’s semester grade point average was added once the semester was complete. As the grades were retrieved and stored, the dataset was de-identified to ensure confidentiality.
A level and other 16 to 18 results - Student counts and Results - A level by...
explore-education-statistics.service.gov.uk
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2024). A level and other 16 to 18 results - Student counts and Results - A level by subject and student characteristics (end of 16-18 study) [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/1e9c2d22-aa6e-4af8-b070-b937d2937b5e
Explore at:
Dataset updated
Apr 18, 2024
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A level student counts by grade achieved, subject and student characteristics.Student characteristics include gender, ethnicity, disadvantage status, free school meal provision, first language, special educational needs (SEN) provision, and KS4 prior attainment.Includes students triggered for inclusion in performance tables who completed A levels during 16-18 study, after discounting of exams.

Facebook

Twitter

Click to copy link

Link copied

Cite

MOMONO (2022). Students Data Analysis [Dataset]. https://www.kaggle.com/datasets/erqizhou/students-data-analysis

Students Data Analysis

For student group structures and predictions, this is only fictional

Explore at:

zip(2174 bytes)Available download formats

Dataset updated

Jul 20, 2022

Authors

MOMONO

Description

A little paragraph from one real dataset, with a few little changes to protect students' private information. Permissions are given.

Goals

You are going to help teachers with only the data: 1. Prediction: To tell what makes a brilliant student who can apply for a graduate school, whether abroad or not. 2. Application: To help those who fails to apply for a graduate school with advice in job searching.

Tips

Educational data may have subtle structures, hierarchies and heterogeneity are probably involved. Simple regressions can hardly make any difference. Also, you should keep an eye on the collinearity in some indicators collected by teachers who have already forgot statistics.
Not all students are free to choose to apply for a graduate school, but some were born with privileges.
Some of the students are trying (or planning to try) to apply for a graduate school for years, you should be responsible to give advice accurately under their circumstances

About the Data

Some of the original structure are deleted or censored. For those are left: Basic data like: - ID - class: categorical, initially students were divided into 2 classes, yet teachers suspect that of different classes students may performance significant differently. - gender - race: categorical and censored - GPA: real numbers, float

Some teachers assume that scores of math curriculums can represent one's likelihood perfectly: - Algebra: real numbers, Advanced Algebra - ......

Some assume that background of students can affect their choices and likelihood significantly, which are all censored as: - from1: students' home locations - from2: a probably bad indicator for preference on mathematics - from 3: how did students apply for this university (undergraduate) - from4: a probably bad indicator for family background. 0 with more wealth, 4 with more poverty

The final indicator y: - 0, one fails to apply for the graduate school, who may apply again or search jobs in the future - 1, success, inland - 2, success, abroad

Clear search

Close search

Google apps

Main menu

Students Data Analysis

Goals

Tips

About the Data

Student Performance

Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

These grades are related with the course subject, Math or Portuguese:

Student Performance Dataset: Academic Insights 10K

2019 Public Data File - Students

Fictional Student Performance Dataset

School Attendance by Student Group and District, 2021-2022

High School Student Performance & Demographics

Attributes for both student_math_cleaned.csv (Math course) and student_portuguese_cleaned.csv (Portuguese language course) datasets:

These grades are related with the course subject, Math or Portuguese:

Data from: Student grade prediction dataset

Data on students' group project preferences

Residential School Locations Dataset (CSV Format)

First Generation College Students Experiences - Qualitative Dataset 2021

Student Study Performance

Problem Statement:

Content

Inspiration:

Datasets for Sentiment Analysis

Data from: Universal Access to Free School Meals and Student Achievement:...

Dataset for Paper "Towards Increased Diversity in STEM Education: Five...

Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort" - Rev #1

Survey 2019

Survey 2020

Interviews

Predict students' dropout and academic success

Predict students' dropout and academic success

Investigating the Impact of Social and Economic Factors

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

A level and other 16 to 18 results - Retention - student characteristics

Data from: Racial Economic Segregation across U.S. Public Schools, 1991-2022...

Dataset: Fitbits, field-tests, and grades. The effects of a healthy and...

A level and other 16 to 18 results - Student counts and Results - A level by...

Students Data Analysis

For student group structures and predictions, this is only fictional

Goals

Tips

About the Data