This large, international dataset contains survey responses from N = 12,570 students from 100 universities in 35 countries, collected in 21 languages. We measured anxieties (statistics, mathematics, test, trait, social interaction, performance, creativity, intolerance of uncertainty, and fear of negative evaluation), self-efficacy, persistence, and the cognitive reflection test, and collected demographics, previous mathematics grades, self-reported and official statistics grades, and statistics module details. Data reuse potential is broad, including testing links between anxieties and statistics/mathematics education factors, and examining instruments’ psychometric properties across different languages and contexts. Note that the pre-registration can be found here: https://osf.io/xs5wf
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This dataset shows the location of Higher Education (HE) and Further Education (FE) institutes in the Great Britain. This should cover Universities and Colleges. Many institutes have more than one campus and where possible this is refelcted in the data so a University may have more than one entry. Postcodes have also been included for instities where possible. This data was collected from various sources connected with HEFE in the UK including JISC and EDINA. This represents the fullest list that the author could compile from various sources. If you spot a missing institution, please contact the author and they will add it to the dataset. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2011-02-01 and migrated to Edinburgh DataShare on 2017-02-21.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The student sample for this research was selected from YouthSight’s Student Panel. Based on HESA statistics, the sample comprises national representation of gender, course year, and university type. The data is weighted on these factors. After fieldwork, the sample collected was checked for quality, and any ‘straight-liners’ were removed from the final total. The total student sample size is 2,153 respondents.Fieldwork was carried out between 29th July and 2nd August 2019.The survey instrument was developed by reviewing the limited number of studies and surveys on freedom of expression, consultations with colleagues and informed by our own experience. This resulted in the inclusion of seven comparative statements that are routinely used in surveys on freedom of expression in US universities, and a 15-item Moral Foundations Questionnaire, which enables the data to be interrogated by underlying moral profile. The definition of freedom of expression uses the framing adopted by King’s College London, which was developed through extensive consultation with the Students’ Union.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify schoolchildren and full-time students aged 5 years and over in England and Wales by student accommodation and by age. The estimates are as at Census Day, 21 March 2021.
Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Coverage
Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:
Student accommodation type
Combines the living situation of students and school children in full-time education, whether they are living:
It also includes whether these households contain one or multiple families.
This variable is comparable with the student accommodation variable but splits the communal establishment type into “university” and “other” categories.
Age
A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The information refers to NI domiciled students enrolled at higher education institutions in the UK. The dataset is collected annually and is based on enrolments in higher education institutions in the UK on 1st December each year. The dataset is collected by the Higher Education Statistics Agency from higher education institutions throughout the UK and provided to the Department for Employment and Learning, Northern Ireland, for analysis. For 2013/14, NI Domiciled enrolments and qualifications at Open University are available. In previous years, these figures were included in NI students studying in England, as the administrative centre of the Open University is located in England. The specification of the HESA Standard Registration Population has changed for 2007/08 enrolments onwards. Writing up and sabbatical students are now excluded from this population where they were previously included in published enrolment data and therefore 2007/08 data onwards cannot be directly compared to previous years.
DOI Abstract copyright UK Data Service and data collection copyright owner.The USR consists of records of undergraduate students on courses of one academic year or more; postgraduate students on courses of one academic year or more; academic and related staff holding regular salaried appointments, and finance data for all UK universities. The Finance dataset contains details of income and expenditure for all of the UK universities. These data are contained in a series of files for each year. For detailed information on structure and content of these files users should refer to the documentation that accompanies this dataset. Also included in the Finance dataset is the Student Load data. Student Load is, in the USR context, a reallocation of student-head count numbers, by apportioning them as a percentage to the departmental cost centres where they are taught, thus enabling student load, staff and financial data to be brought together. Main Topics: Finance: income and expenditure; university; cost centre. Student load: undergraduate, postgraduate (taught course or research); cost centre. No information recorded Annual returns from each university.
Abstract copyright UK Data Service and data collection copyright owner. This study is comprised by the data collected for a wider project exploring the historical relationship between higher education and the UK economy. The project sought to provide a long-term explanation of the relationships between funding, widening access and socio-economic aspects of higher education. Three main areas were considered: -The provision of an in-depth historical account and analysis of the numbers and extent of students and staff for the purposes of evaluating the main characteristics of UK higher education development back the 1920s. -The provision of an in-depth historical account and evaluation of levels and structures of income and expenditure in higher education -The interpretation of these data with reference to major socio-economic indicators. Main Topics: This study is a collation and analysis of statistics on UK higher education which refers to pre-1992 universities and includes all institutions delivering degrees afterwards. The dataset, which gathers historical series on funding and development of universities from the early 1920s, is the result of research into primary and secondary governmental and institutional sources. Please note: this study does not include information on named individuals and would therefore not be useful for personal family history research. No sampling (total universe) Compilation or synthesis of existing material
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The information refers to NI domiciled students gaining higher education qualifications from UK higher education institutions. The dataset is collected annually and is based on students obtaining a qualification at UK higher education institutions. The dataset is collected by the Higher Education Statistics Agency from higher education institutions throughout the UK and provided to the Department for Employment and Learning, Northern Ireland, for analysis. For 2013/14, NI Domiciled enrolments and qualifications at Open University are available. In previous years, these figures were included in NI students studying in England, as the administrative centre of the Open University is located in England.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset derived from the sistematic review describes at https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=330361
The Health Survey for England (HSE), 2002: Teaching Dataset has been prepared solely for the purpose of teaching and student use. The dataset will help class tutors to incorporate empirical data into their courses and thus to develop students’ skills in quantitative methods of analysis.
All the variables and value labels are those used in the original HSE files, with one exception (New-wt) which is a new weighting variable.
Users may be interested in the Guide to using SPSS for Windows available from Online statistical guides and which explores this dataset.
The original HSE 2002 dataset is held at the UK Data Archive under SN 4912.
This dataset presents a cluster analysis of UK universities based on four synthetic environments: social, cultural, physical and economic. These were developed based on variables that represented an educational ecosystem of well-being. The cluster analysis was initially linked to the LSYPE-Secure dataset using the UKPRNs (i.e. higher education institutional number) and hence the cluster analysis used data from around 2009-2012 to represent Wave 6 and Wave 7 of the LSYPE-Secure dataset. The cluster analysis was based on using a variety of variables available from HESA and the Office for Students (OfS) to represent these environments, for example: Social: had demographics of students and staff including ethnicity and sex Cultural: had data on research and teaching scores Economic: had data on student: staff ratio and expenditure Physical: had data related to the built and natural environment including residential sites, blue and green spaces
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data article presents the relationship between university league tables and teaching qualification in the UK. Data were collected from the university and subject league tables (Complete University Guide) and teaching qualification (The Higher Education Academy - HEA), and Higher Education Funding Council for England - Hefce), UK.
Abstract copyright UK Data Service and data collection copyright owner.The USR consists of records of undergraduate students on courses of one academic year or more; postgraduate students on courses of one academic year or more; academic and related staff holding regular salaried appointments, and finance data for all UK universities. Main Topics: (i) Personal information: date of birth; sex; marital status; country/county of domicile; country of birth; whether home or overseas student for fee purposes; occupation of parent or guardian. (ii) Academic history: last full-time school attended; other full-time/part-time post secondary educational institution attended; GCE `A' level or Scottish Certificate of Education higher grade results; other entrance qualifications; course for which admitted. (iii) Annual information: university; subject of course; normal duration of course; type of course; year of course; date of enrolment; method of study (full-time, part-time, sandwich, etc.); qualification aimed for; source of fees; accommodation (hall. lodgings, home, etc.). (iv) Leavers' details: qualification obtained; class of degree; date of leaving; reason for leaving; first destination. No information recorded Annual returns from each university.
The GLA commissioned the Social Market Foundation to look at the reasons behind the non-continuation (drop-out) rate of undergraduates studying at London’s higher education institutions. This report seeks to understand the factors affecting non-continuation and transfers at London universities. London’s non-continuation rate is 7.7%, which is much higher than the English average of 6.3%, and students in London are the most likely to transfer to another university compared to students in the rest of the country. We seek to build on previous SMF work by focusing on why students leave university in London and the report looks in-depth at the differences in retention by ethnicity and socio-economic status. This report draws on qualitative and quantitative evidence. Interviews were conducted with 20 individuals from London who attended and withdrew from a London university and quantitative analysis of HESA data on young students in London between 2013/14 and 2015/16.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The early identification of students facing learning difficulties is one of the most critical challenges in modern education. Intervening effectively requires leveraging data to understand the complex interplay between student demographics, engagement patterns, and academic performance.
This dataset was created to serve as a high-quality, pre-processed resource for building machine learning models to tackle this very problem. It is a unique hybrid dataset, meticulously crafted by unifying three distinct sources:
The Open University Learning Analytics Dataset (OULAD): A rich dataset detailing student interactions with a Virtual Learning Environment (VLE). We have aggregated the raw, granular data (over 10 million interaction logs) into powerful features, such as total clicks, average assessment scores, and distinct days of activity for each student registration.
The UCI Student Performance Dataset: A classic educational dataset containing demographic information and final grades in Portuguese and Math subjects from two Portuguese schools.
A Synthetic Data Component: A synthetically generated portion of the data, created to balance the dataset or represent specific student profiles.
A direct merge of these sources was not possible as the student identifiers were not shared. Instead, a strategy of intelligent concatenation was employed. The final dataset has undergone a rigorous pre-processing pipeline to make it immediately usable for machine learning tasks:
Advanced Imputation: Missing values were handled using a sophisticated iterative imputation method powered by Gaussian Mixture Models (GMM), ensuring the dataset's integrity.
One-Hot Encoding: All categorical features have been converted to a numerical format.
Feature Scaling: All numerical features have been standardized (using StandardScaler) to have a mean of 0 and a standard deviation of 1, preventing model bias from features with different scales.
The result is a clean, comprehensive dataset ready for modeling.
Each row represents a student profile, and the columns are the features and the target.
Features include aggregated online engagement metrics (e.g., clicks, distinct activities), academic performance (grades, scores), and student demographics (e.g., gender, age band). A key feature indicates the original data source (OULAD, UCI, Synthetic).
The dataset contains no Personally Identifiable Information (PII). Demographic information is presented in broad, anonymized categories.
Key Columns:
Target Variable:
had_difficulty: The primary target for classification. This binary variable has been engineered from the original final_result column of the OULAD dataset.
1: The student either failed (Fail) or withdrew (Withdrawn) from the course.
0: The student passed (Pass or Distinction).
Feature Groups:
OULAD Aggregated Features (e.g., oulad_total_cliques, oulad_media_notas): Quantitative metrics summarizing a student's engagement and performance within the VLE.
Academic Performance Features (e.g., nota_matematica_harmonizada): Harmonized grades from different data sources.
Demographic Features (e.g., gender_*, age_band_*): One-hot encoded columns representing student demographics.
Origin Features (e.g., origem_dado_OULAD, origem_dado_UCI): One-hot encoded columns indicating the original source of the data for each row. This allows for source-specific analysis.
(Note: All numerical feature names are post-scaling and may not directly reflect their original names. Please refer to the complete column list for details.)
This dataset would not be possible without the original data providers. Please acknowledge them in any work that uses this data:
OULAD Dataset: Kuzilek, J., Hlosta, M., and Zdrahal, Z. (2017). Open University Learning Analytics dataset. Scientific Data, 4. https://analyse.kmi.open.ac.uk/open_dataset
UCI Student Performance Dataset: P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS. https://archive.ics.uci.edu/ml/datasets/student+performance
This dataset is perfect for a variety of predictive modeling tasks. Here are a few ideas to get you started:
Can you build a classification model to predict had_difficulty with high recall? (Minimizing the number of at-risk students we fail to identify).
Which features are the most powerful predictors of student failure or withdrawal? (Feature Importance Analysis).
Can you build separate models for each data origin (origem_dado_*) and compare ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
Map showing existing student accommodation where known. The data is based on a number of sources including the University of Bristol, University of the West of England, Bristol City Council planning monitoring data and web research. The map shows all forms of residential accommodation built or converted for use by students. The map also shows sites with planning permission for student accommodation and sites with a planning application for student accommodation but where permission has not yet been granted.
The datasets provided by UK based online learning university "Open University". More about the dataset: https://analyse.kmi.open.ac.uk/open_dataset
Abstract copyright UK Data Service and data collection copyright owner. The purpose of this study was the construction and analysis of a database of the records relating to students who attended the University of Aberdeen from 1860 to 1920, in order to create a comprehensive textual dossier of information about individual students which could be accessed easily by the University Archivist in answering frequent enquiries about past students; to facilitate the researches of scholars preparing publications on different aspects of University life for the institution's quincentenary in 1995; to demonstrate trends in the geographical and social mobility of the student population, as well as the impact on academic life of major changes in the curriculum. Main Topics: Three broad categories of data have been sought in constructing a database of students at the University of Aberdeen from 1860-1920. Out of a possible total of 52 fields, approximately 16 are devoted to biographical and background information, including dates of birth and death, place of origin, schools attended and father's occupation. A further 21 fields are concerned with the student's university education and experiences, including actual and total periods of study, classes attended, examinations passed (where applicable), degree(s) obtained, both at Aberdeen and at other universities, bursaries, prizes and medals awarded, the location of lodgings (where applicable), and membership of university societies. The third major area of investigation was the student's post-university life, incorporating information on locations, careers, and civil, military and academic honours and awards. Several further fields have been allocated to identifying sources of information. The database has been designed to allow not only minute documentation of the backgrounds and careers of individual students, but also global analysis of changing patterns of geographical and social mobility among the student population as a whole. Data on students' (and parents') occupations have been classified in conformity with the Registrar General's classification. No sampling (total universe)
Universities have played key roles in disseminating information on infectious diseases and the use of vaccines as protective measures. Maintenance of this information flow throughout the pandemic has helped universities with protecting their young adult populations against COVID-19. Universities are also substantial economic engines with home and international students being crucial funding sources. The dataset from a survey of University of Leicester undergraduate students in June 2021 is made available. The dataset contains 827 cases (questionnaires) and 78 variables.
This large, international dataset contains survey responses from N = 12,570 students from 100 universities in 35 countries, collected in 21 languages. We measured anxieties (statistics, mathematics, test, trait, social interaction, performance, creativity, intolerance of uncertainty, and fear of negative evaluation), self-efficacy, persistence, and the cognitive reflection test, and collected demographics, previous mathematics grades, self-reported and official statistics grades, and statistics module details. Data reuse potential is broad, including testing links between anxieties and statistics/mathematics education factors, and examining instruments’ psychometric properties across different languages and contexts. Note that the pre-registration can be found here: https://osf.io/xs5wf