This file includes enrollment data from 2012-13 school year. Data are disaggregated by school, district, and state levels and include counts of students by the following groups: grade level, gender, race/ethnicity, and student programs, and special characteristics. Please review the notes below for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Student Enrollment reports the number of enrolled students per year, per grade.
By Harish Kumar Garg [source]
This dataset is about the number of Indian students studying abroad in different countries and the detailed information about different nations where Indian students are present. The data has been complied from the Ministry Of External Affairs to answer a question from the Member of Parliament regarding how many students from India are studying in foreign countries and which country. This dataset includes two fields, Country Name and Number of Indians Studying Abroad as of Mar 2017, giving a unique opportunity to track student mobility across various nations around the world. With this valuable data about student mobility, we can gain insights into how educational opportunities for Indian students have increased over time as well as look at trends in international education throughout different regions. From comparison among countries with similar academic opportunities to tracking regional popularity among study destinations, this dataset provides important context for studying student migration patterns. We invite everyone to explore this data further and use it to draw meaningful conclusions!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to use this dataset?
The data has two columns – Country Name and Number of Indians studying there as of March 2017. It also includes a third column, Percentage, which gives an indication about the proportion of Indian students enrolled in each country relative to total number enrolled abroad globally.
To get started with your exploration, you can visualize the data against various parameters like geographical region or language speaking as it may provide more clarity about motives/reasons behind student’s choice. You can also group countries on basis of research opportunities available, cost consideration etc.,to understand deeper into all aspects that motivate Indians to explore further studies outside India.
Additionally you can use this dataset for benchmarking purpose with other regional / international peer groups or aggregate regional / global reports with aim towards making better decisions or policies aiming greater outreach & support while targeting foreign universities/colleges for educational promotion activities that highlights engaging elements aimed at attracting more potential students from India aspiring higher international education experience abroad!
- Using this dataset, educational institutions in India can set up international exchange programs with universities in other countries to facilitate and support Indian students studying abroad.
Higher Education Institutions can also understand the current trend of Indian students sourcing for opportunities to study abroad and use this data to build specialized short-term courses in collaboration with universities from different countries that cater to the needs of students who are interested in moving abroad permanently or even temporarily for higher studies.
Policy makers could use this data to assess the current trends and develop policies that aim at incentivizing international exposure among young professionals by commissioning fellowships or scholarships with an aim of exposing them to different problem sets around the world thereby making their profile more attractive while they look for better job opportunities globally
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: final_data.csv | Column name | Description | |:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------| | Country | Name of the country where Indian students are studying. (String) | | No of Indian Students | Number of Indian students studying in the country. (Integer) | | Percentage | Percentage of Indian students studying in the country compared to the total number of Indian students studying abroad. (Float) |
If you use this dataset in your research, please credit ...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.
This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.
The dataset is ideal for:
Column Name | Description |
---|---|
College_ID | Unique ID of the college (e.g., CLG0001 to CLG0100) |
IQ | Student’s IQ score (normally distributed around 100) |
Prev_Sem_Result | GPA from the previous semester (range: 5.0 to 10.0) |
CGPA | Cumulative Grade Point Average (range: ~5.0 to 10.0) |
Academic_Performance | Annual academic rating (scale: 1 to 10) |
Internship_Experience | Whether the student has completed any internship (Yes/No) |
Extra_Curricular_Score | Involvement in extracurriculars (score from 0 to 10) |
Communication_Skills | Soft skill rating (scale: 1 to 10) |
Projects_Completed | Number of academic/technical projects completed (0 to 5) |
Placement | Final placement result (Yes = Placed, No = Not Placed) |
This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.
MIT
Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.
The data here is from the report entitled Trends in Enrollment, Credit Attainment, and Remediation at Connecticut Public Universities and Community Colleges: Results from P20WIN for the High School Graduating Classes of 2010 through 2016. The report answers three questions: 1. Enrollment: What percentage of the graduating class enrolled in a Connecticut public university or community college (UCONN, the four Connecticut State Universities, and 12 Connecticut community colleges) within 16 months of graduation? 2. Credit Attainment: What percentage of those who enrolled in a Connecticut public university or community college within 16 months of graduation earned at least one year’s worth of credits (24 or more) within two years of enrollment? 3. Remediation: What percentage of those who enrolled in one of the four Connecticut State Universities or one of the 12 community colleges within 16 months of graduation took a remedial course within two years of enrollment? Notes on the data: School Credit: % Earning 24 Credits is a subset of the % Enrolled in 16 Months. School Remediation: % Enrolled in Remediation is a subset of the % Enrolled in 16 Months.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been collected to support research on predicting the academic performance of Secondary School Certificate (SSC) and Higher Secondary Certificate (HSC) students in Bangladesh. It comprises responses from many students across various institutions in the country.
The dataset includes a diverse set of features that are believed to influence academic outcomes. These features cover a wide range of domains such as:
Demographic Information: Age, gender, parental education, and occupation.
Academic History: Previous grades, subject preferences, study time, tutoring, etc.
Socioeconomic Factors: Family income, number of siblings, living location (urban/rural).
Institutional Factors: Type of school/college (public/private), distance from home, teacher-student ratio, etc.
Lifestyle and Behavioral Aspects: Sleep habits, screen time, daily routines, mental health indicators, and parental support.
The dataset is labeled with the actual academic performance (grades or GPA) of students in SSC and HSC examinations. The goal is to facilitate the development of predictive models and interpretability studies, with a focus on early intervention and academic counseling.
The dataset is anonymized and free from personally identifiable information. It is intended for academic research, education policy analysis, and machine learning experimentation.
if you use the dataset, please cite "A. A. Maruf, R. Ara Rumy, R. I. Sony and Z. Aung, "Predictive Analysis of Bangladeshi Students’ Academic Performances Using Ensemble Machine Learning with Explainable AI Techniques," 2024 27th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2024, pp. 1200-1205, doi: 10.1109/ICCIT64611.2024.11021990."
This dataset includes the attendance rate for public school students PK-12 by student group and by district during the 2021-2022 school year. Student groups include: Students experiencing homelessness Students with disabilities Students who qualify for free/reduced lunch English learners All high needs students Non-high needs students Students by race/ethnicity (Hispanic/Latino of any race, Black or African American, White, All other races) Attendance rates are provided for each student group by district and for the state. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch. When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.
This dataset contains college enrollment information by Michigan House of Representative district. College enrollment was defined as the number of public high school students who graduated in 2017, who enrolled in a college or university. This dataset includes enrollment in two-year and four-year institutions of higher education. Click here for metadata (descriptions of the fields).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises novel aspects specifically, in terms of student grading in diverse educational cultures within the multiple countries – Researchers and other education sectors will be able to see the impact of having varied curriculums in a country. Dataset compares different levelling cases when student transfer from curriculum to curriculum and the unreliable levelling criteria set by schools currently in an international school. The collected data can be used within the intelligent algorithms specifically machine learning and pattern analysis methods, to develop an intelligent framework applicable in multi-cultural educational systems to aid in a smooth transition “levelling, hereafter” of students who relocate from a particular education curriculum to another; and minimize the impact of switching on the students’ educational performance. The preliminary variables taken into consideration when deciding which data to collect depended on the variables. UAE is a multicultural country with many expats relocating from regions such as Asia, Europe and America. In order to meet expats needs, UAE has established many international private schools, therefore UAE was chosen to be the location of study based on many cases and struggles in levelling declared by the Ministry of Education and schools. For the first time, we present this dataset comprising students’ records for two academic years that included math, English, and science for 3 terms. Selection of subject areas and number of terms was based on influence from other researchers in similar subject matters.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.
This dataset delves into the correlation between dropout rates and student success in various educational settings. It includes comprehensive information on student demographics, academic performance, and factors contributing to dropout incidents. The dataset aims to provide valuable insights for educators, policymakers, and researchers to enhance strategies for fostering student retention and academic achievement.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17474923%2Fc00e9ef81fed562fd0f70e620fef80f7%2Fcollege-dropouts1.jpg?generation=1704037747011701&alt=media" alt="">
The dataset includes information known at the time of student enrollment – academic path, demographics, and social-economic factors.
- Marital status: Categorical variable indicating the marital status of the individual. (1 – single 2 – married 3 – widower 4 – divorced 5 – facto union 6 – legally separated)
- Application mode: Categorical variable indicating the mode of application. (1 - 1st phase - general contingent 2 - Ordinance No. 612/93 5 - 1st phase - special contingent (Azores Island) 7 - Holders of other higher courses 10 - Ordinance No. 854-B/99 15 - International student (bachelor) 16 - 1st phase - special contingent (Madeira Island) 17 - 2nd phase - general contingent 18 - 3rd phase - general contingent 26 - Ordinance No. 533-A/99, item b2) (Different Plan) 27 - Ordinance No. 533-A/99, item b3 (Other Institution) 39 - Over 23 years old 42 - Transfer 43 - Change of course 44 - Technological specialization diploma holders 51 - Change of institution/course 53 - Short cycle diploma holders 57 - Change of institution/course (International)).
- Application order: Numeric variable indicating the order of application. (between 0 - first choice; and 9 last choice).
- Course: Categorical variable indicating the chosen course. (33 - Biofuel Production Technologies 171 - Animation and Multimedia Design 8014 - Social Service (evening attendance) 9003 - Agronomy 9070 - Communication Design 9085 - Veterinary Nursing 9119 - Informatics Engineering 9130 - Equinculture 9147 - Management 9238 - Social Service 9254 - Tourism 9500 - Nursing 9556 - Oral Hygiene 9670 - Advertising and Marketing Management 9773 - Journalism and Communication 9853 - Basic Education 9991 - Management (evening attendance)).
- evening attendance: Binary variable indicating whether the individual attends classes during the daytime or evening. (1 for daytime, 0 for evening).
- Previous qualification: Numeric variable indicating the level of the previous qualification. (1 - Secondary education 2 - Higher education - bachelor's degree 3 - Higher education - degree 4 - Higher education - master's 5 - Higher education - doctorate 6 - Frequency of higher education 9 - 12th year of schooling - not completed 10 - 11th year of schooling - not completed 12 - Other - 11th year of schooling 14 - 10th year of schooling 15 - 10th year of schooling - not completed 19 - Basic education 3rd cycle (9th/10th/11th year) or equiv. 38 - Basic education 2nd cycle (6th/7th/8th year) or equiv. 39 - Technological specialization course 40 - Higher education - degree (1st cycle) 42 - Professional higher technical course 43 - Higher education - master (2nd cycle)).
- Nationality: Categorical variable indicating the nationality of the individual. (1 - Portuguese; 2 - German; 6 - Spanish; 11 - Italian; 13 - Dutch; 14 - English; 17 - Lithuanian; 21 - Angolan; 22 - Cape Verdean; 24 - Guinean; 25 - Mozambican; 26 - Santomean; 32 - Turkish; 41 - Brazilian; 62 - Romanian; 100 - Moldova (Republic of); 101 - Mexican; 103 - Ukrainian; 105 - Russian; 108 - Cuban; 109 - Colombian).
- Mother's qualification: Numeric variable indicating the level of the mother's qualification.
(1 - Secondary Education - 12th Year of Schooling or Eq. 2 - Higher Education - Bachelor's Degree 3 - Higher Education - Degree 4 - Higher Education - Master's 5 - Higher Education - Doctorate 6 - Frequency of Higher Education 9 - 12th Year of Schooling - Not Completed 10 - 11th Year of Schooling - Not Completed 11 - 7th Year (...
This file includes Report Card enrollment data from 2015-16 school year. Data is disaggregated by school, district, and the state level and includes counts of students by the following groups: grade level, gender, race/ethnicity, and student programs and special characteristics. Please review the notes below for more information.
Student enrollment data disaggregated by students from low-income families, students from each racial and ethnic group, gender, English learners, children with disabilities, children experiencing homelessness, children in foster care, and migratory students for each mode of instruction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Verbal and Quantitative Reasoning GRE scores and percentiles were collected by querying the student database for the appropriate information. Any student records that were missing data such as GRE scores or grade point average were removed from the study before the data were analyzed. The GRE Scores of entering doctoral students from 2007-2012 were collected and analyzed. A total of 528 student records were reviewed. Ninety-six records were removed from the data because of a lack of GRE scores. Thirty-nine of these records belonged to MD/PhD applicants who were not required to take the GRE to be reviewed for admission. Fifty-seven more records were removed because they did not have an admissions committee score in the database. After 2011, the GRE’s scoring system was changed from a scale of 200-800 points per section to 130-170 points per section. As a result, 12 more records were removed because their scores were representative of the new scoring system and therefore were not able to be compared to the older scores based on raw score. After removal of these 96 records from our analyses, a total of 420 student records remained which included students that were currently enrolled, left the doctoral program without a degree, or left the doctoral program with an MS degree. To maintain consistency in the participants, we removed 100 additional records so that our analyses only considered students that had graduated with a doctoral degree. In addition, thirty-nine admissions scores were identified as outliers by statistical analysis software and removed for a final data set of 286 (see Outliers below). Outliers We used the automated ROUT method included in the PRISM software to test the data for the presence of outliers which could skew our data. The false discovery rate for outlier detection (Q) was set to 1%. After removing the 96 students without a GRE score, 432 students were reviewed for the presence of outliers. ROUT detected 39 outliers that were removed before statistical analysis was performed. Sample See detailed description in the Participants section. Linear regression analysis was used to examine potential trends between GRE scores, GRE percentiles, normalized admissions scores or GPA and outcomes between selected student groups. The D’Agostino & Pearson omnibus and Shapiro-Wilk normality tests were used to test for normality regarding outcomes in the sample. The Pearson correlation coefficient was calculated to determine the relationship between GRE scores, GRE percentiles, admissions scores or GPA (undergraduate and graduate) and time to degree. Candidacy exam results were divided into students who either passed or failed the exam. A Mann-Whitney test was then used to test for statistically significant differences between mean GRE scores, percentiles, and undergraduate GPA and candidacy exam results. Other variables were also observed such as gender, race, ethnicity, and citizenship status within the samples. Predictive Metrics. The input variables used in this study were GPA and scores and percentiles of applicants on both the Quantitative and Verbal Reasoning GRE sections. GRE scores and percentiles were examined to normalize variances that could occur between tests. Performance Metrics. The output variables used in the statistical analyses of each data set were either the amount of time it took for each student to earn their doctoral degree, or the student’s candidacy examination result.
This file includes Report Card enrollment data from 2023-24 school year. Data is disaggregated by school, district, and the state level and includes counts of students by the following groups: grade level, gender, race/ethnicity, and student programs and special characteristics. Please review the notes below for more information.
Technical Colleges Enrollment Data of Degree & Non-degree Students for the year 2015/2016.
Number of home institution students attending a SUNY campus by level (Undergraduate/Graduate) and load status (full-time, part-time). SUNY System combined annual enrollment since 1948.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description. This project contains the dataset relative to the Galatanet survey, conducted in 2009 and 2010 at the Galatasaray University in Istanbul (Turkey). The goal of this survey was to retrieve information regarding the social relationships between students, their feeling regarding the university in general, and their purchase behavior. The survey was conducted during two phases: the first one in 2009 and the second in 2010.
The dataset includes two kinds of data. First, the answers to most of the questions are contained in a large table, available under both CSV and MS Excel formats. An description file allows understanding the meaning of each field appearing in the table. Note thesurvey form is also contained in the archive, for reference (it is in French and Turkish only, though). Second, the social network of students is available under both Pajek and Graphml formats. Having both individual (nodal attributes) and relational (links) information in the same dataset is, to our knowledge, rare and difficult to find in public sources, and this makes (to our opinion) this dataset interesting and valuable.
All data are completely anonymous: students' names have been replaced by random numbers. Note that the survey is not exactly the same between the two phases: some small adjustments were applied thanks to the feedback from the first phase (but the datasets have been normalized since then). Also, the electronic form was very much improved for the second phase, which explains why the answers are much more complete than in the first phase.
The data were used in our following publications:
Labatut, V. & Balasque, J.-M. (2010). Business-oriented Analysis of a Social Network of University Students. In: International Conference on Advances in Social Network Analysis and Mining, 25-32. Odense, DK : IEEE. ⟨hal-00633643⟩ - DOI: 10.1109/ASONAM.2010.15
An extended version of the original article: Labatut, V. & Balasque, J.-M. (2013). Informative Value of Individual and Relational Data Compared Through Business-Oriented Community Detection. Özyer, T.; Rokne, J.; Wagner, G. & Reuser, A. H. (Eds.), The Influence of Technology on Social Network Analysis and Mining, Springer, 2013, chap.6, 303-330. ⟨hal-00633650⟩ - DOI: 10.1007/978-3-7091-1346-2_13
A more didactic article using some of these data just for illustration purposes: Labatut, V. & Balasque, J.-M. (2012). Detection and Interpretation of Communities in Complex Networks: Methods and Practical Application. Abraham, A. & Hassanien, A.-E. (Eds.), Computational Social Networks: Tools, Perspectives and Applications, Springer, chap.4, 81-113. ⟨hal-00633653⟩ - DOI: 10.1007/978-1-4471-4048-1_4
Citation. If you use this data, please cite article [1] above:
@InProceedings{Labatut2010, author = {Labatut, Vincent and Balasque, Jean-Michel}, title = {Business-oriented Analysis of a Social Network of University Students}, booktitle = {International Conference on Advances in Social Networks Analysis and Mining}, year = {2010}, pages = {25-32}, address = {Odense, DK}, publisher = {IEEE Publishing}, doi = {10.1109/ASONAM.2010.15},}
Contact. 2009-2010 by Jean-Michel Balasque (jmbalasque@gsu.edu.tr) & Vincent Labatut (vlabatut@gsu.edu.tr)
License. This dataset is open data: you can redistribute it and/or use it under the terms of the Creative Commons Zero license (see license.txt
).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The master dataset contains comprehensive information for all government schools in NSW. Data items include school locations, latitude and longitude coordinates, school type, student enrolment numbers, electorate information, contact details and more.
This dataset is publicly available through the Data NSW website, and is used to support the School Finder tool.
Data Notes:
Data relating to healthy canteen is no longer up to date as it is no longer updated by the Department, this data can be sourced through NSW health.
Student enrolment numbers are based on the census of government school students undertaken on the first Friday of August; and LBOTE numbers are based on data collected in March.
School information, such as addresses and contact details, are updated regularly as required, and are the most current source of information.
Data is suppressed for indigenous and LBOTE percentages where student numbers are equal to, or less than five indicated by "np".
NSSC out of scope schools will not have an enrolment figure.
NSSC and LBOTE figures are updated annually in December.
ICSEA values are updated every February with the previous year's ICSEA values. Small schools, SSPs and Senior Secondary schools do not have their ICSEA values published by ACARA.
Family Occupation and Educational Index (FOEI) is a school-level index of educational disadvantage. Data is extracted in May and values are updated annually in December.
Following the introduction of part-time study in secondary schools in 1993, student enrolments are generally reported in full-time equivalent units (FTE). The FTE for students studying less than 10 units, the minimum workload, is determined by the formula: 0.1 x the number of units studied and represented as a proportion of the full-time enrolment of 1.0 FTE.
Data Source:
We know that students at elite universities tend to be from high-income families, and that graduates are more likely to end up in high-status or high-income jobs. But very little public data has been available on university admissions practices. This dataset, collected by Opportunity Insights, gives extensive detail on college application and admission rates for 139 colleges and universities across the United States, including data on the incomes of students. How do admissions practices vary by institution, and are wealthy students overrepresented?
Education equality is one of the most contested topics in society today. It can be defined and explored in many ways, from accessible education to disabled/low-income/rural students to the cross-generational influence of doctorate degrees and tenure track positions. One aspect of equality is the institutions students attend. Consider the “Ivy Plus” universities, which are all eight Ivy League schools plus MIT, Stanford, Duke, and Chicago. Although less than half of one percent of Americans attend Ivy-Plus colleges, they account for more than 10% of Fortune 500 CEOs, a quarter of U.S. Senators, half of all Rhodes scholars, and three-fourths of Supreme Court justices appointed in the last half-century.
A 2023 study (Chetty et al, 2023) tried to understand how these elite institutions affect educational equality:
Do highly selective private colleges amplify the persistence of privilege across generations by taking students from high-income families and helping them obtain high-status, high-paying leadership positions? Conversely, to what extent could such colleges diversify the socioeconomic backgrounds of society’s leaders by changing their admissions policies?
To answer these questions, they assembled a dataset documenting the admission and attendance rate for 13 different income bins for 139 selective universities around the country. They were able to access and link not only student SAT/ACT scores and high school grades, but also parents’ income through their tax records, students’ post-college graduate school enrollment or employment (including earnings, employers, and occupations), and also for some selected colleges, their internal admission ratings for each student. This dataset covers students in the entering classes of 2010–2015, or roughly 2.4 million domestic students.
They found that children from families in the top 1% (by income) are more than twice as likely to attend an Ivy-Plus college as those from middle-class families with comparable SAT/ACT scores, and two-thirds of this gap can be attributed to higher admission rates with similar scores, with the remaining third due to the differences in rates of application and matriculation (enrollment conditional on admission). This is not a shocking conclusion, but we can further explore elite college admissions by socioeconomic status to understand the differences between elite private colleges and public flagships admission practices, and to reflect on the privilege we have here and to envision what a fairer higher education system could look like.
The data has been aggregated by university and by parental income level, grouped into 13 income brackets. The income brackets are grouped by percentile relative to the US national income distribution, so for instance the 75.0 bin represents parents whose incomes are between the 70th and 80th percentile. The top two bins overlap: the 99.4 bin represents parents between the 99 and 99.9th percentiles, while the 99.5 bin represents parents in the top 1%.
Each row represents students’ admission and matriculation outcomes from one income bracket at a given university. There are 139 colleges covered in this dataset.
The variables include an array of different college-level-income-binned estimates for things including attendance rate (both raw and reweighted by SAT/ACT scores), application rate, and relative attendance rate conditional on application, also with respect to specific test score bands for each college and in/out-of state. Colleges are categorized into six tiers: Ivy Plus, other elite schools (public and private), highly selective public/private, and selective public/private, with selectivity generally in descending order. It also notes whether a college is public and/or flagship, where “flagship” means public flagship universities. Furthermore, they also report the relative application rate for each income bin within specific test bands, which are 50-point bands that had the most attendees in each school tier/category.
Several values are reported in “test-score-reweighted” form. These values control for SAT score: they are calculated separately for each SAT score value, then averaged with weights based on the distribution of SAT scores at the institution.
Note that since private schools typically don’t differentiate between in-...
This file includes enrollment data from 2012-13 school year. Data are disaggregated by school, district, and state levels and include counts of students by the following groups: grade level, gender, race/ethnicity, and student programs, and special characteristics. Please review the notes below for more information.