The data here is from the report entitled Trends in Enrollment, Credit Attainment, and Remediation at Connecticut Public Universities and Community Colleges: Results from P20WIN for the High School Graduating Classes of 2010 through 2016. The report answers three questions: 1. Enrollment: What percentage of the graduating class enrolled in a Connecticut public university or community college (UCONN, the four Connecticut State Universities, and 12 Connecticut community colleges) within 16 months of graduation? 2. Credit Attainment: What percentage of those who enrolled in a Connecticut public university or community college within 16 months of graduation earned at least one year’s worth of credits (24 or more) within two years of enrollment? 3. Remediation: What percentage of those who enrolled in one of the four Connecticut State Universities or one of the 12 community colleges within 16 months of graduation took a remedial course within two years of enrollment? Notes on the data: School Credit: % Earning 24 Credits is a subset of the % Enrolled in 16 Months. School Remediation: % Enrolled in Remediation is a subset of the % Enrolled in 16 Months.
We know that students at elite universities tend to be from high-income families, and that graduates are more likely to end up in high-status or high-income jobs. But very little public data has been available on university admissions practices. This dataset, collected by Opportunity Insights, gives extensive detail on college application and admission rates for 139 colleges and universities across the United States, including data on the incomes of students. How do admissions practices vary by institution, and are wealthy students overrepresented?
Education equality is one of the most contested topics in society today. It can be defined and explored in many ways, from accessible education to disabled/low-income/rural students to the cross-generational influence of doctorate degrees and tenure track positions. One aspect of equality is the institutions students attend. Consider the “Ivy Plus” universities, which are all eight Ivy League schools plus MIT, Stanford, Duke, and Chicago. Although less than half of one percent of Americans attend Ivy-Plus colleges, they account for more than 10% of Fortune 500 CEOs, a quarter of U.S. Senators, half of all Rhodes scholars, and three-fourths of Supreme Court justices appointed in the last half-century.
A 2023 study (Chetty et al, 2023) tried to understand how these elite institutions affect educational equality:
Do highly selective private colleges amplify the persistence of privilege across generations by taking students from high-income families and helping them obtain high-status, high-paying leadership positions? Conversely, to what extent could such colleges diversify the socioeconomic backgrounds of society’s leaders by changing their admissions policies?
To answer these questions, they assembled a dataset documenting the admission and attendance rate for 13 different income bins for 139 selective universities around the country. They were able to access and link not only student SAT/ACT scores and high school grades, but also parents’ income through their tax records, students’ post-college graduate school enrollment or employment (including earnings, employers, and occupations), and also for some selected colleges, their internal admission ratings for each student. This dataset covers students in the entering classes of 2010–2015, or roughly 2.4 million domestic students.
They found that children from families in the top 1% (by income) are more than twice as likely to attend an Ivy-Plus college as those from middle-class families with comparable SAT/ACT scores, and two-thirds of this gap can be attributed to higher admission rates with similar scores, with the remaining third due to the differences in rates of application and matriculation (enrollment conditional on admission). This is not a shocking conclusion, but we can further explore elite college admissions by socioeconomic status to understand the differences between elite private colleges and public flagships admission practices, and to reflect on the privilege we have here and to envision what a fairer higher education system could look like.
The data has been aggregated by university and by parental income level, grouped into 13 income brackets. The income brackets are grouped by percentile relative to the US national income distribution, so for instance the 75.0 bin represents parents whose incomes are between the 70th and 80th percentile. The top two bins overlap: the 99.4 bin represents parents between the 99 and 99.9th percentiles, while the 99.5 bin represents parents in the top 1%.
Each row represents students’ admission and matriculation outcomes from one income bracket at a given university. There are 139 colleges covered in this dataset.
The variables include an array of different college-level-income-binned estimates for things including attendance rate (both raw and reweighted by SAT/ACT scores), application rate, and relative attendance rate conditional on application, also with respect to specific test score bands for each college and in/out-of state. Colleges are categorized into six tiers: Ivy Plus, other elite schools (public and private), highly selective public/private, and selective public/private, with selectivity generally in descending order. It also notes whether a college is public and/or flagship, where “flagship” means public flagship universities. Furthermore, they also report the relative application rate for each income bin within specific test bands, which are 50-point bands that had the most attendees in each school tier/category.
Several values are reported in “test-score-reweighted” form. These values control for SAT score: they are calculated separately for each SAT score value, then averaged with weights based on the distribution of SAT scores at the institution.
Note that since private schools typically don’t differentiate between in-...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
There were approximately 18.58 million college students in the U.S. in 2022, with around 13.49 million enrolled in public colleges and a further 5.09 million students enrolled in private colleges. The figures are projected to remain relatively constant over the next few years.
What is the most expensive college in the U.S.? The overall number of higher education institutions in the U.S. totals around 4,000, and California is the state with the most. One important factor that students – and their parents – must consider before choosing a college is cost. With annual expenses totaling almost 78,000 U.S. dollars, Harvey Mudd College in California was the most expensive college for the 2021-2022 academic year. There are three major costs of college: tuition, room, and board. The difference in on-campus and off-campus accommodation costs is often negligible, but they can change greatly depending on the college town.
The differences between public and private colleges Public colleges, also called state colleges, are mostly funded by state governments. Private colleges, on the other hand, are not funded by the government but by private donors and endowments. Typically, private institutions are  much more expensive. Public colleges tend to offer different tuition fees for students based on whether they live in-state or out-of-state, while private colleges have the same tuition cost for every student.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the Number of students in Public University by Academic Qualification, Malaysia, 2000 - 2021. Note: 1) 2000-2007: Data is not available for Certificate, Matriculation and Professional. 2) 2000-2006: Data is not available for Others. 3) 2010-2021: Certificate not offered. 4) Others included the Advance Diploma, Pre-Diploma, and Pre-Session. Source : Ministry of Higher Education Malaysia No. of Views : 321
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset UnitelmaSapienza - Version 1.0 (May 2021)Description: This is the dataset from the article "Hidden Space Deep Sequential Risk Prediction on Student Trajectories" by Bardh Prenkaj, Damiano Distante, Stefano Faralli and Paola Velardi (under review)Disclaimer: Due to privacy protection regulations, all data are aggregated and fully anonymized.Dataset Description:The zip contains 13 folders, each folder contains the timeseries (timeseries.csv) of a specific learning course from the University of Rome Unitelma Sapienza.Each row of a timeseries.csv file corresponds to a student. Each column is a two dimensional numpy array, which corresponds to 365 days of interactions of the student with the e-learning platform.The last column corresponds to the probability of drop-out (see the above indicated paper for more information on how this probability is thresholded to obtain a binary ground truth).This dataset can be directly used as input for the code we released at http://iim.di.uniroma1.it/projects/hsdsrpst/When using this dataset in your work please cite the paper it is associated to.Authors:Bardh Prenkaj - Sapienza University of Rome, ItalyDamiano Distante - University of Rome Unitelma Sapienza, ItalyStefano Faralli - University of Rome Unitelma Sapienza, ItalyPaola Velardi - Sapienza University of Rome, ItalyLicense:Attribution 4.0 International (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/deed.enFor any question regarding the dataset and the source code please contact:Bardh Prenkaj (prenkaj@di.uniroma1.it)
This dataset contains college enrollment information for the state of Michigan. College enrollment was defined as the number of public high school students who graduated in 2017, who enrolled in a college or university. This dataset includes enrollment in two-year and four-year institutions of higher education.
Number of home institution students attending a SUNY campus by level (Undergraduate/Graduate) and load status (full-time, part-time). SUNY System combined annual enrollment since 1948.
Data product is provided by ASL Marketing. It contains current college students who are attending colleges and universities nationwide. Connect with this market by: Class Year Field of Study Home/School address College Attending Ethnicity School Type Region Sports Conference Gender eSports Email
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Student Performance Dataset 2024 Overview This dataset comprises detailed information about high school students in China, collected from various universities and schools. It is designed to analyze the factors influencing student performance, well-being, and engagement. The data includes a wide range of features such as demographic details, academic performance, health status, parental support, and more. The participating institutions include prominent universities such as Tsinghua University, Peking University, Fudan University, Shanghai Jiao Tong University, and Zhejiang University.
Dataset Description Features Student ID: Unique identifier for each student. Gender: Gender of the student (Male/Female). Age: Age of the student. Grade Level: The grade level of the student (e.g., 9, 10, 11, 12). Attendance Rate: The percentage of days the student attended school. Study Hours: Average number of hours the student spends studying daily. Parental Education Level: The highest level of education attained by the student's parents. Parental Involvement: The level of parental involvement in the student's education (High, Medium, Low). Extracurricular Activities: Whether the student participates in extracurricular activities (Yes/No). Socioeconomic Status: Socioeconomic status of the student's family (High, Medium, Low). Previous Academic Performance: Previous academic performance level (High, Medium, Low). Class Participation: The level of participation in class (High, Medium, Low). Health Status: General health status of the student (Good, Average, Poor). Access to Learning Resources: Whether the student has access to necessary learning resources (Yes/No). Internet Access: Whether the student has access to the internet (Yes/No). Learning Style: Preferred learning style of the student (Visual, Auditory, Kinesthetic). Teacher-Student Relationship: Quality of the relationship between the student and teachers (Positive, Neutral, Negative). Peer Influence: Influence of peers on the student's behavior and performance (Positive, Neutral, Negative). Motivation Level: Student's level of motivation (High, Medium, Low). Hours of Sleep: Average number of hours the student sleeps per night. Diet Quality: Quality of the student's diet (Good, Average, Poor). Transportation Mode: Mode of transportation used by the student to commute to school (Bus, Car, Walk, Bike). School Type: Type of school attended by the student (Public, Private). School Location: Location of the school (Urban, Rural). Homework Completion Rate: The rate at which the student completes homework assignments. Reading Proficiency: Proficiency level in reading. Math Proficiency: Proficiency level in mathematics. Science Proficiency: Proficiency level in science. Language Proficiency: Proficiency level in language. Physical Activity Level: The level of physical activity (High, Medium, Low). Screen Time: Average daily screen time in hours. Bullying Incidents: Number of bullying incidents the student has experienced. Special Education Services: Whether the student receives special education services (Yes/No). Counseling Services: Whether the student receives counseling services (Yes/No). Learning Disabilities: Whether the student has any learning disabilities (Yes/No). Behavioral Issues: Whether the student has any behavioral issues (Yes/No). Attendance of Tutoring Sessions: Whether the student attends tutoring sessions (Yes/No). School Climate: Overall perception of the school's environment (Positive, Neutral, Negative). Parental Employment Status: Employment status of the student's parents (Employed, Unemployed). Household Size: Number of people living in the student's household. Performance Score: Overall performance score of the student (Low, Medium, High).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes the University-Wise Student Enrollment by Province in 2074 BS from Ministry of Education.
This dataset contains the total annual FTE and unduplicated headcount enrollment for undergraduate and graduate students at public community colleges and state universities in Massachusetts since 2014.
This dataset is 1 of 2 datasets that is also published in the interactive Annual Enrollment dashboard on the Department of Higher Education Data Center:
Public Postsecondary Annual Enrollment Public Postsecondary Annual Enrollment by Race and Gender
Related datasets: Public Postsecondary Fall Enrollment Public Postsecondary Fall Enrollment by Race and Gender
Notes: - Data appear as reported to the Massachusetts Department of Higher Education. - Annual enrollment refers to a 12 month enrollment period over one fiscal year (July 1 through June 30). - Figures published by DHE may differ slightly from figures published by other institutions and organizations due to differences in timing of publication, data definitions, and calculation logic. - Data for the University of Massachusetts are not included due to unique reporting requirements. See Fall Enrollment for HEIRS data on UMass enrollment. -The most common measure of enrollment is headcount of enrolled students. Annual headcount enrollment is unduplicated, meaning any individual student is only counted once per institution and fiscal year, even if they are enrolled in multiple terms. Enrollment can also be measured as full-time equivalent (FTE) students, a calculation based on the sum of credits carried by all enrolled students. In a fiscal year, 30 undergraduate credits = 1 undergraduate FTE, and 24 graduate credits = 1 graduate FTE at a state university.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents student enrollment figures at Hamad Bin Khalifa University, categorized by college, nationality (Qatari, Non-Qatari), and gender. It includes male, female, and total enrollment counts per group, offering insights into demographic and academic trends across the university’s colleges.
The National Survey of College Graduates is a repeated cross-sectional biennial survey that provides data on the nation's college graduates, with a focus on those in the science and engineering workforce. This survey is a unique source for examining the relationship of degree field and occupation in addition to other characteristics of college-educated individuals, including work activities, salary, and demographic information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides the number of students enrolled in private colleges and universities in Qatar, categorized by educational institution, nationality, and gender. The data includes institutions such as Education City Universities, Hamad Bin Khalifa University, and Lusail University. It allows for the analysis of student enrollment trends across different institutions, nationalities (Qatari and Non-Qatari), and genders. This dataset is useful for understanding the distribution of students in Qatar's higher education institutions, as well as the participation of male and female students within these institutions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is crafted for beginners to practice data cleaning and preprocessing techniques in machine learning. It contains 157 rows of student admission records, including duplicate rows, missing values, and some data inconsistencies (e.g., outliers, unrealistic values). It’s ideal for practicing common data preparation steps before applying machine learning algorithms.
The dataset simulates a university admission record system, where each student’s admission profile includes test scores, high school percentages, and admission status. The data contains realistic flaws often encountered in raw data, offering hands-on experience in data wrangling.
The dataset contains the following columns:
Name: Student's first name (Pakistani names). Age: Age of the student (some outliers and missing values). Gender: Gender (Male/Female). Admission Test Score: Score obtained in the admission test (includes outliers and missing values). High School Percentage: Student's high school final score percentage (includes outliers and missing values). City: City of residence in Pakistan. Admission Status: Whether the student was accepted or rejected.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It's no secret that US university students often graduate with debt repayment obligations that far outstrip their employment and income prospects. While it's understood that students from elite colleges tend to earn more than graduates from less prestigious universities, the finer relationships between future income and university attendance are quite murky. In an effort to make educational investments less speculative, the US Department of Education has matched information from the student financial aid system with federal tax returns to create the College Scorecard dataset.
Kaggle is hosting the College Scorecard dataset in order to facilitate shared learning and collaboration. Insights from this dataset can help make the returns on higher education more transparent and, in turn, more fair.
Here's a script showing an exploratory overview of some of the data.
college-scorecard-release-*.zip contains a compressed version of the same data available through Kaggle Scripts.
It consists of three components:
New to data exploration in R? Take the free, interactive DataCamp course, "Data Exploration With Kaggle Scripts," to learn the basics of visualizing data with ggplot. You'll also create your first Kaggle Scripts along the way.
This dataset contains college enrollment information, by county subdivision, for the state of Michigan. College enrollment was defined as the number of public high school students who graduated in 2017, who enrolled in a college or university. This dataset includes enrollment in two-year and four-year institutions of higher education.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success.
Attribute | Description |
---|---|
Hours_Studied | Number of hours spent studying per week. |
Attendance | Percentage of classes attended. |
Parental_Involvement | Level of parental involvement in the student's education (Low, Medium, High). |
Access_to_Resources | Availability of educational resources (Low, Medium, High). |
Extracurricular_Activities | Participation in extracurricular activities (Yes, No). |
Sleep_Hours | Average number of hours of sleep per night. |
Previous_Scores | Scores from previous exams. |
Motivation_Level | Student's level of motivation (Low, Medium, High). |
Internet_Access | Availability of internet access (Yes, No). |
Tutoring_Sessions | Number of tutoring sessions attended per month. |
Family_Income | Family income level (Low, Medium, High). |
Teacher_Quality | Quality of the teachers (Low, Medium, High). |
School_Type | Type of school attended (Public, Private). |
Peer_Influence | Influence of peers on academic performance (Positive, Neutral, Negative). |
Physical_Activity | Average number of hours of physical activity per week. |
Learning_Disabilities | Presence of learning disabilities (Yes, No). |
Parental_Education_Level | Highest education level of parents (High School, College, Postgraduate). |
Distance_from_Home | Distance from home to school (Near, Moderate, Far). |
Gender | Gender of the student (Male, Female). |
Exam_Score | Final exam score. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aggregated data, by campus and permanent home address zipcode. This dataset will allow users to see which zipcodes students are commuting (if applicable) from to their campuses.
The data here is from the report entitled Trends in Enrollment, Credit Attainment, and Remediation at Connecticut Public Universities and Community Colleges: Results from P20WIN for the High School Graduating Classes of 2010 through 2016. The report answers three questions: 1. Enrollment: What percentage of the graduating class enrolled in a Connecticut public university or community college (UCONN, the four Connecticut State Universities, and 12 Connecticut community colleges) within 16 months of graduation? 2. Credit Attainment: What percentage of those who enrolled in a Connecticut public university or community college within 16 months of graduation earned at least one year’s worth of credits (24 or more) within two years of enrollment? 3. Remediation: What percentage of those who enrolled in one of the four Connecticut State Universities or one of the 12 community colleges within 16 months of graduation took a remedial course within two years of enrollment? Notes on the data: School Credit: % Earning 24 Credits is a subset of the % Enrolled in 16 Months. School Remediation: % Enrolled in Remediation is a subset of the % Enrolled in 16 Months.