This large, international dataset contains survey responses from N = 12,570 students from 100 universities in 35 countries, collected in 21 languages. We measured anxieties (statistics, mathematics, test, trait, social interaction, performance, creativity, intolerance of uncertainty, and fear of negative evaluation), self-efficacy, persistence, and the cognitive reflection test, and collected demographics, previous mathematics grades, self-reported and official statistics grades, and statistics module details. Data reuse potential is broad, including testing links between anxieties and statistics/mathematics education factors, and examining instruments’ psychometric properties across different languages and contexts. Note that the pre-registration can be found here: https://osf.io/xs5wf
The Labour Force Survey plays a vital role in providing reliable statistics on employment and economic issues, which makes it valuable for policy formulation, research and media.
The Quarterly Labour Force Survey, January-March 2023: Teaching Dataset (QLFS JM 23 teaching data) is a sub-sample from the main Quarterly Labour Force Survey, January - March 2023 (available from the UK Data Archive under SN 9097).
The QLFS JM 23 teaching dataset has been adapted for the purpose of teaching and learning.
The main differences are:
Further information is available in the study documentation which includes a dataset user guide.
The National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.
The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.
Survey and Biomeasures Data (GN 33004):
To date there have been ten attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137), the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669), and the tenth sweep was conducted in 2020-24 when the respondents were aged 60-64 (held under SN 9412).
A Secure Access version of the NCSD is available under SN 9413, containing detailed sensitive variables not available under Safeguarded access (currently only sweep 10 data). Variables include uncommon health conditions (including age at diagnosis), full employment codes and income/finance details, and specific life circumstances (e.g. pregnancy details, year/age of emigration from GB).
Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.
From 2002-2004, a Biomedical Survey was completed and is available under End User Licence (EUL) (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.
Linked Geographical Data (GN 33497):
A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.
Linked Administrative Data (GN 33396):
A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.
Multi-omics Data and Risk Scores Data (GN 33592)
Proteomics analyses were run on the blood samples collected from NCDS participants in 2002-2004 and are available under SL SN 9254. Metabolomics analyses were conducted on respondents of sweep 10 and are available under SL SN 9411.
Additional Sub-Studies (GN 33562):
In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.
How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.
Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.
The National Child Development Study: Linked Health Administrative Datasets (Hospital Episode Statistics), England, 1997-2023: Secure Access includes data files from the NHS Digital HES database for those cohort members who provided consent to health data linkage in the Age 50 sweep. The HES database contains information about all hospital admissions in England. The following linked HES data are available:
1) Accident and Emergency (A&E)
The A&E dataset details each attendance to an Accident and Emergency care facility in England, between 01-04-2007 and 31-03-2020 (inclusive). It includes major A&E departments, single speciality A&E departments, minor injury units and walk-in centres in England.
2) Admitted Patient Care (APC)
The APC data summarises episodes of care for admitted patients, where the episode occurred between 01-04-1997 and 31-03-2023 (inclusive).
3) Critical Care (CC)
The CC dataset covers records of critical care activity between 01-04-2009 and 31-03-2023 (inclusive).
4) Out Patient (OP)
The OP dataset lists the outpatient appointments between 01-04-2003 and 31-03-2023 (inclusive).
5) Emergency Care Dataset (ECDS)
The ECDS lists the emergency care appointments between 01-04-2020 and 31-03-2023 (inclusive).
6) Consent data
The consents dataset describes consent to linkage, and is current at the time of deposit.
CLS/ NHS Digital Sub-licence agreement
NHS Digital has given CLS permission for onward sharing of the NCDS/HES dataset via the UKDS Secure Lab. In order to ensure data minimisation, NHS Digital requires that researchers only access the HES variables needed for their approved research project. Therefore, the HES linked data provided by the UKDS to approved researchers will be subject to sub-setting of variables. The researcher will need to request a specific sub-set of variables from the NCDS/HES data dictionary, which will subsequently be made available within their UKDS Secure Account. Once the researcher has finished their research, the UKDS will delete the tailored dataset for that specific project. Any party wishing to access the data deposited at the UK Data Service will be required to enter into a Licence agreement with CLS (UCL), in addition to the agreements signed with the UKDS, provided in the application pack.
CLS Hospital Episode Statistics data access update July 2025
From March 2027, HES data linked to all four CLS studies will no longer be available via the UK Data Service. For projects ending before March 2027, uses should continue to apply via UKDS. However, if access to a wider range of linked Longitudinal Population Studies data is needed, UKLLC might be more suitable. For projects ending after March 2027, users must apply via UKLLC.
Latest edition information
For the third edition (April 2025), the data have been updated to include linked data for the financial years 2017-2022. In addition, a new dataset for Emergency Care (ECDS) episodes has been added, along with a dataset detailing the consent for linkage. Furthermore, the study documentation has also been updated.
This Unrestricted Access Teaching Dataset is based on the Crime Survey for England and Wales (CSEW), 2013-2014. It has been prepared for teaching and student use only with the aim of helping class tutors incorporate empirical data into their courses and supporting students to develop skills in quantitative data analysis. It contains data for 8,843 cases from the CSEW 2013-14 (non-victim form dataset) for a small selection of variables.
Most variables come directly from the CSEW 2013-14. However, some variables have been recoded and additional scalar variables have been added to support teaching and learning.
This Teaching Dataset is available under the Open Government Licence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods.
© 2020, Bastian Bechtold. All rights reserved.
Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise.
The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time.
Included Code and Data
ground truth data.zip
is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora:
noisy speech data.zip
is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora:
synthetic speech data.zip
is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise.noisy_speech.pkl
and synthetic_speech.pkl
are pickled Pandas dataframes of performance metrics derived from the above data for the following list of fundamental frequency estimation algorithms:
noisy speech evaluation.py
and synthetic speech evaluation.py
are Python programs to calculate the above Pandas dataframes from the above JBOF datasets. They calculate the following performance measures:
Pipfile
is a pipenv-compatible pipfile for installing all prerequisites necessary for running the above Python programs.The Python programs take about an hour to compute on a fast 2019 computer, and require at least 32 Gb of memory.
References:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides the datasets generated by the three creators (data challenge organisers) and subsequently provided to the participants of the EVA 2023 Data Challenge.
The dataset aims to capture the variety of contexts experienced in the analysis of environmental extremes data. This involves both univariate and multivariate problems. The univariate extremes problems involve inference for extreme quantiles when faced with additional complications such as covariates; data missing at random; and the need to convert the inference into design levels which account for different losses from over- and under-design.
The data set consists of five data files: 1. Amaurot: Training data given to the participants for Tasks 1 and 2 2. AmaurotTestSet: Collection of test data points for which predictions had to be submitted 3. Coputopia: Data participants had to consider for Task 3 4. UtopulaU1 + UtopulaU2: Data participants had to consider for Task 4
The aim of this dataset, developed for the Data Challenge, is to assess performance in multivariate extremes in a way that is independent of marginal extremes abilities. Consequently, the multivariate problems relate to data where the univariate marginal distributions are all known.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study was conducted to explore the effects prostitution legislation has on sex trafficking rates. This issue holds paramount importance in the fields of legal studies and human rights. By leveraging advanced machine learning techniques to analyze data from the Counter-Trafficking Data Collaborative (CTDC), encompassing 180 countries, this study aims to uncover the relationship between various prostitution legislation types and sex trafficking occurrences. The exploration begins with extensive cleaning, merging, and filtering of the CTDC dataset, integrating it with prostitution legislation data from the World Population Review. This process ensures a harmonized dataset that accurately reflects the global landscape of sex trafficking in relation to legislative frameworks. The machine learning model initially concentrated on prostitution legislation as a key variable but evolved to include a broader range of factors like registration year, population, growth rate, gender, and citizenship. This expansion was crucial in developing a more accurate and holistic model.This study offered a nuanced exploration of the impact of prostitution legislation on sex trafficking, employing sophisticated data analysis and machine learning models to parse through extensive data. The advanced RandomForestClassifier was key in the research, achieving an 87% accuracy rate for predicting instances of sex trafficking and demonstrating the need to incorporate diverse predictive features. Notably, the analysis emphasized the importance of the legislative feature in accurately predicting sex trafficking, despite the inclusion of other variables to improve overall model precision. These findings underscore the significance of a multifaceted approach, considering factors like demographics and socio-economic indicators, to gain a comprehensive understanding of sex trafficking trends.Complementing the machine learning insights, a logistic regression model scrutinized the specific effects of different legislative approaches on sex trafficking. The analysis revealed that legislative frameworks such as legalization, abolitionism, decriminalization, and neo-abolitionism have a considerable influence on reducing sex trafficking rates, suggesting their potential as effective legal strategies. Alternantively, prohibition legislation is found to corrrelate with significantly higher sex trafficking rates. These results serve as a critical resource for policymakers and advocates engaged in the development of informed, evidence-based approaches to address the global challenge of sex trafficking.
Abstract copyright UK Data Service and data collection copyright owner.
The 1970 British Cohort Study (BCS70) is a longitudinal birth cohort study, following a nationally representative sample of over 17,000 people born in England, Scotland and Wales in a single week of 1970. Cohort members have been surveyed throughout their childhood and adult lives, mapping their individual trajectories and creating a unique resource for researchers. It is one of very few longitudinal studies following people of this generation anywhere in the world.
Since 1970, cohort members have been surveyed at ages 5, 10, 16, 26, 30, 34, 38, 42, 46, and 51. Featuring a range of objective measures and rich self-reported data, BCS70 covers an incredible amount of ground and can be used in research on many topics. Evidence from BCS70 has illuminated important issues for our society across five decades. Key findings include how reading for pleasure matters for children's cognitive development, why grammar schools have not reduced social inequalities, and how childhood experiences can impact on mental health in mid-life. Every day researchers from across the scientific community are using this important study to make new connections and discoveries.
BCS70 is run by the Centre for Longitudinal Studies (CLS), a research centre in the UCL Institute of Education, which is part of University College London. The content of BCS70 studies, including questions, topics and variables can be explored via the CLOSER Discovery website.
How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
For information on how to access biomedical data from BCS70 that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.
Secure Access datasets
Secure Access versions of BCS70 have more restrictive access conditions than versions available under the standard End User Licence (EUL).
In 2012, consent was sought for data linkage of health administrative records from the Hospital Episode Statistics (HES) to survey data for cohort members in the 1970 British Cohort Study (BCS70). The main aim of this data linkage exercise is to enhance the research potential of the study, by combining administrative record with the rich information collected in the surveys. The 1970 British Cohort Study: Linked Health Administrative Datasets (Hospital Episode Statistics), England, 1997-2023: Secure Access contains information about all hospital admissions in England. The following linked HES data are available:
1) Accident and Emergency (A&E)
The A&E dataset details each attendance to an Accident and Emergency care facility in England, between 01-04-2007 and 31-03-2019 (inclusive). It includes major A&E departments, single speciality A&E departments, minor injury units and walk-in centres in England.
2) Admitted Patient Care (APC)
The APC data summarises episodes of care for admitted patients, where the episode occurred between 01-04-1997 and 31-03-2023 (inclusive).
3) Critical Care (CC)
The CC dataset covers records of critical care activity between 01-04-2009 and 31-03-2023 (inclusive).
4) Out Patient (OP)
The OP dataset lists the outpatient appointments between 01-04-2003 and 31-03-2023 (inclusive).
5) Emergency Care Dataset (ECDS)
The ECDS lists the emergency care appointments between 01-04-2020 and 31-03-2023 (inclusive).
6) Consent data
The consents dataset describes consent to linkage, and is current at the time of deposit
CLS/ NHS Digital Sub-licence agreement
NHS Digital has given CLS permission for onward sharing of the Next Steps/HES dataset via the UKDS Secure Lab. In order to ensure data minimisation, NHS Digital requires that researchers only access the HES variables needed for their approved research project. Therefore, the HES linked data provided by the UKDS to approved researchers will be subject to sub-setting of variables. The researcher will need to request a specific sub-set of variables from the Next Steps HES data dictionary, which will subsequently make available within their UKDS Secure Account. Once the researcher has finished their research, the UKDS will delete the tailored dataset for that specific project.
Any party wishing to access the data deposited at the UK Data Service will be required to enter into a Licence agreement with CLS (UCL), in addition to the agreements signed with the UKDS, provided in the application pack.
The Licensee shall acknowledge in any publication, whether printed, electronic or broadcast, based wholly or in part on such materials, both the source of the data and UCL. An example of an appropriate acknowledgement can be found here: https://cls.ucl.ac.uk/data-access-training/citing-our-data/.
Latest edition information
For the...
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify schoolchildren and full-time students aged 5 years and over in England and Wales by student accommodation and by age. The estimates are as at Census Day, 21 March 2021.
Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Coverage
Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:
Student accommodation type
Combines the living situation of students and school children in full-time education, whether they are living:
It also includes whether these households contain one or multiple families.
This variable is comparable with the student accommodation variable but splits the communal establishment type into “university” and “other” categories.
Age
A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age.
The Quarterly Labour Force Survey July - September 2018: Teaching Dataset is based on the Quarterly Labour Force Survey, July - September 2018 (QLFS JS18; available from the UK Data Archive under SN 8407) and constitutes real data which are used by the government and are behind many headlines. The teaching dataset contains fewer variables and has been subjected to certain simplifications and additions for the purpose of learning and teaching.
The main differences are:
Further information is available in the study documentation which includes a dataset user guide.
Abstract copyright UK Data Service and data collection copyright owner.
The Great Britain Historical Database has been assembled as part of the ongoing Great Britain Historical GIS Project. The project aims to trace the emergence of the north-south divide in Britain and to provide a synoptic view of the human geography of Britain at sub-county scales. Further information about the project is available on A Vision of Britain webpages, where users can browse the database's documentation system online.
The Great Britain Historical GIS Project has also produced digitised boundary data, which can be obtained from the UK Data Service Census Support service. Further information is available at census.ukdataservice.ac.uk
The Great Britain Historical Database is a large database of British nineteenth and twentieth-century statistics. Where practical the referencing of spatial units has been integrated, data for different dates have been assembled into single tables.
The Great Britain Historical Database currently contains :
The Opinions and Lifestyle Survey, Well-Being Module, April-May 2015: Unrestricted Access Teaching Dataset is based on the Opinions and Lifestyle Survey, Well-Being Module, January, February, April and May, 2015 (available from UK Data Archive under SN 7815) and constitutes real data which are used by government, business and other organisations. The teaching dataset is a subset which has been subjected to certain simplifications and additions for the purpose of learning and teaching.
The main differences are:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This large, international dataset contains survey responses from N = 12,570 students from 100 universities in 35 countries, collected in 21 languages. We measured anxieties (statistics, mathematics, test, trait, social interaction, performance, creativity, intolerance of uncertainty, and fear of negative evaluation), self-efficacy, persistence, and the cognitive reflection test, and collected demographics, previous mathematics grades, self-reported and official statistics grades, and statistics module details. Data reuse potential is broad, including testing links between anxieties and statistics/mathematics education factors, and examining instruments’ psychometric properties across different languages and contexts. Note that the pre-registration can be found here: https://osf.io/xs5wf