Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Presents statistics on student support paid to students in the form of loans and grants or to their University/College in the form of tution fees. The students are English domiciles studying anywhere in the UK or EU students studying in England. Source agency: Student Loans Company Designation: National Statistics Language: English Alternative title: Student Support for Higher Education in England Data and Resources 2012/13 (Final) and 2013/14 (Provisional)HTML 2012/13 (Final) and 2013/14 (Provisional)
This large, international dataset contains survey responses from N = 12,570 students from 100 universities in 35 countries, collected in 21 languages. We measured anxieties (statistics, mathematics, test, trait, social interaction, performance, creativity, intolerance of uncertainty, and fear of negative evaluation), self-efficacy, persistence, and the cognitive reflection test, and collected demographics, previous mathematics grades, self-reported and official statistics grades, and statistics module details. Data reuse potential is broad, including testing links between anxieties and statistics/mathematics education factors, and examining instruments’ psychometric properties across different languages and contexts. Note that the pre-registration can be found here: https://osf.io/xs5wf
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Presents statistics on student support paid to students in the form of loans and grants or to their University/College in the form of tution fees. The students are Northern Ireland domiciles studying anywhere in the UK, ROI or EU students studying in Northern Ireland. Source agency: Student Loans Company Designation: Official Statistics not designated as National Statistics Language: English Alternative title: Student Support for Higher Education in Northern Ireland
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This dataset shows the location of Higher Education (HE) and Further Education (FE) institutes in the Great Britain. This should cover Universities and Colleges. Many institutes have more than one campus and where possible this is refelcted in the data so a University may have more than one entry. Postcodes have also been included for instities where possible. This data was collected from various sources connected with HEFE in the UK including JISC and EDINA. This represents the fullest list that the author could compile from various sources. If you spot a missing institution, please contact the author and they will add it to the dataset. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2011-02-01 and migrated to Edinburgh DataShare on 2017-02-21.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Presents statistics on student support paid to students in the form of loans and grants or to their University/College in the form of tution fees. The students are Welsh domiciles studying anywhere in the UK or EU students studying in Wales. Source agency: Student Loans Company Designation: Official Statistics not designated as National Statistics Language: English Alternative title: Student Support for Higher Education in Wales Data and Resources Academic Year 2012/13 (Final) and 2013/14 (Provisional)HTML Academic Year 2012/13 (Final) and 2013/14 (Provisional)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset derived from the sistematic review describes at https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=330361
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about universities in the United Kingdom. It has 92 rows. It features 5 columns: country, total students, domain, and graduate students.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The information refers to NI domiciled students gaining higher education qualifications from UK higher education institutions. The dataset is collected annually and is based on students obtaining a qualification at UK higher education institutions. The dataset is collected by the Higher Education Statistics Agency from higher education institutions throughout the UK and provided to the Department for Employment and Learning, Northern Ireland, for analysis. For 2013/14, NI Domiciled enrolments and qualifications at Open University are available. In previous years, these figures were included in NI students studying in England, as the administrative centre of the Open University is located in England.
The Health Survey for England (HSE), 2002: Teaching Dataset has been prepared solely for the purpose of teaching and student use. The dataset will help class tutors to incorporate empirical data into their courses and thus to develop students’ skills in quantitative methods of analysis.
All the variables and value labels are those used in the original HSE files, with one exception (New-wt) which is a new weighting variable.
Users may be interested in the Guide to using SPSS for Windows available from Online statistical guides and which explores this dataset.
The original HSE 2002 dataset is held at the UK Data Archive under SN 4912.
Data product is provided by ASL Marketing. It contains current college students who are attending colleges and universities nationwide. Connect with this market by: Class Year Field of Study Home/School address College Attending Ethnicity School Type Region Sports Conference Gender eSports Email
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The information refers to NI domiciled students enrolled at higher education institutions in the UK. The dataset is collected annually and is based on enrolments in higher education institutions in the UK on 1st December each year. The dataset is collected by the Higher Education Statistics Agency from higher education institutions throughout the UK and provided to the Department for Employment and Learning, Northern Ireland, for analysis. For 2013/14, NI Domiciled enrolments and qualifications at Open University are available. In previous years, these figures were included in NI students studying in England, as the administrative centre of the Open University is located in England. The specification of the HESA Standard Registration Population has changed for 2007/08 enrolments onwards. Writing up and sabbatical students are now excluded from this population where they were previously included in published enrolment data and therefore 2007/08 data onwards cannot be directly compared to previous years.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify schoolchildren and full-time students aged 5 years and over in England and Wales by student accommodation and by age. The estimates are as at Census Day, 21 March 2021.
Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Coverage
Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:
Student accommodation type
Combines the living situation of students and school children in full-time education, whether they are living:
It also includes whether these households contain one or multiple families.
This variable is comparable with the student accommodation variable but splits the communal establishment type into “university” and “other” categories.
Age
A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about universities in the United Kingdom. It has 92 rows. It features 15 columns including country, city, total students, and domain.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
These are the grants that HEFCE will be providing to each university and college that we fund. It includes the student number control.
DOI Abstract copyright UK Data Service and data collection copyright owner.The USR consists of records of undergraduate students on courses of one academic year or more; postgraduate students on courses of one academic year or more; academic and related staff holding regular salaried appointments, and finance data for all UK universities. The Finance dataset contains details of income and expenditure for all of the UK universities. These data are contained in a series of files for each year. For detailed information on structure and content of these files users should refer to the documentation that accompanies this dataset. Also included in the Finance dataset is the Student Load data. Student Load is, in the USR context, a reallocation of student-head count numbers, by apportioning them as a percentage to the departmental cost centres where they are taught, thus enabling student load, staff and financial data to be brought together. Main Topics: Finance: income and expenditure; university; cost centre. Student load: undergraduate, postgraduate (taught course or research); cost centre. No information recorded Annual returns from each university.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data article presents the relationship between university league tables and teaching qualification in the UK. Data were collected from the university and subject league tables (Complete University Guide) and teaching qualification (The Higher Education Academy - HEA), and Higher Education Funding Council for England - Hefce), UK.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The early identification of students facing learning difficulties is one of the most critical challenges in modern education. Intervening effectively requires leveraging data to understand the complex interplay between student demographics, engagement patterns, and academic performance.
This dataset was created to serve as a high-quality, pre-processed resource for building machine learning models to tackle this very problem. It is a unique hybrid dataset, meticulously crafted by unifying three distinct sources:
The Open University Learning Analytics Dataset (OULAD): A rich dataset detailing student interactions with a Virtual Learning Environment (VLE). We have aggregated the raw, granular data (over 10 million interaction logs) into powerful features, such as total clicks, average assessment scores, and distinct days of activity for each student registration.
The UCI Student Performance Dataset: A classic educational dataset containing demographic information and final grades in Portuguese and Math subjects from two Portuguese schools.
A Synthetic Data Component: A synthetically generated portion of the data, created to balance the dataset or represent specific student profiles.
A direct merge of these sources was not possible as the student identifiers were not shared. Instead, a strategy of intelligent concatenation was employed. The final dataset has undergone a rigorous pre-processing pipeline to make it immediately usable for machine learning tasks:
Advanced Imputation: Missing values were handled using a sophisticated iterative imputation method powered by Gaussian Mixture Models (GMM), ensuring the dataset's integrity.
One-Hot Encoding: All categorical features have been converted to a numerical format.
Feature Scaling: All numerical features have been standardized (using StandardScaler) to have a mean of 0 and a standard deviation of 1, preventing model bias from features with different scales.
The result is a clean, comprehensive dataset ready for modeling.
Each row represents a student profile, and the columns are the features and the target.
Features include aggregated online engagement metrics (e.g., clicks, distinct activities), academic performance (grades, scores), and student demographics (e.g., gender, age band). A key feature indicates the original data source (OULAD, UCI, Synthetic).
The dataset contains no Personally Identifiable Information (PII). Demographic information is presented in broad, anonymized categories.
Key Columns:
Target Variable:
had_difficulty: The primary target for classification. This binary variable has been engineered from the original final_result column of the OULAD dataset.
1: The student either failed (Fail) or withdrew (Withdrawn) from the course.
0: The student passed (Pass or Distinction).
Feature Groups:
OULAD Aggregated Features (e.g., oulad_total_cliques, oulad_media_notas): Quantitative metrics summarizing a student's engagement and performance within the VLE.
Academic Performance Features (e.g., nota_matematica_harmonizada): Harmonized grades from different data sources.
Demographic Features (e.g., gender_*, age_band_*): One-hot encoded columns representing student demographics.
Origin Features (e.g., origem_dado_OULAD, origem_dado_UCI): One-hot encoded columns indicating the original source of the data for each row. This allows for source-specific analysis.
(Note: All numerical feature names are post-scaling and may not directly reflect their original names. Please refer to the complete column list for details.)
This dataset would not be possible without the original data providers. Please acknowledge them in any work that uses this data:
OULAD Dataset: Kuzilek, J., Hlosta, M., and Zdrahal, Z. (2017). Open University Learning Analytics dataset. Scientific Data, 4. https://analyse.kmi.open.ac.uk/open_dataset
UCI Student Performance Dataset: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS. https://archive.ics.uci.edu/ml/datasets/student+performance
This dataset is perfect for a variety of predictive modeling tasks. Here are a few ideas to get you started:
Can you build a classification model to predict had_difficulty with high recall? (Minimizing the number of at-risk students we fail to identify).
Which features are the most powerful predictors of student failure or withdrawal? (Feature Importance Analysis).
Can you build separate models for each data origin (origem_dado_*) and compare ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Financing university education : a study of university fees and loans to students in Great Britain. It features 7 columns including author, publication date, language, and book publisher.
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2022, is designed for analysts to conduct cross-sectional analysis for the 2022 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.
The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can, however, only involve variables that are collected in every wave (excluding rotating content, which is only collected in some of the waves). Due to overlapping fieldwork, the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the document 9333_main_survey_calendar_year_user_guide_2022.
These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: End User Licence version or Special Licence version.
Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave was also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+-year-old household members are eligible for adult interviews, 10-15-year-old household members are eligible for youth interviews, and some information is collected about 0-9 year-olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022, a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, and questionnaire content, please see the study overview and user guide.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence and Special Licence versions:
There are two versions of the Calendar Year 2022 data. One is available under the standard End User Licence (EUL) agreement (SN 9333), and the other is a Special Licence (SL) version (SN 9334). The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see document 9333_eul_vs_sl_variable_differences for more details). Users are advised first to obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (EUL) and 6931 (SL).
Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2022 dataset, subject to SL access conditions. See the User Guide for further details.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Presents statistics on student support paid to students in the form of loans and grants or to their University/College in the form of tution fees. The students are English domiciles studying anywhere in the UK or EU students studying in England. Source agency: Student Loans Company Designation: National Statistics Language: English Alternative title: Student Support for Higher Education in England Data and Resources 2012/13 (Final) and 2013/14 (Provisional)HTML 2012/13 (Final) and 2013/14 (Provisional)