The ReAding Comprehension dataset from Examinations (RACE) dataset is a machine reading comprehension dataset consisting of 27,933 passages and 97,867 questions from English exams, targeting Chinese students aged 12-18. RACE consists of two subsets, RACE-M and RACE-H, from middle school and high school exams, respectively. RACE-M has 28,293 questions and RACE-H has 69,574. Each question is associated with 4 candidate answers, one of which is correct. The data generation process of RACE differs from most machine reading comprehension datasets - instead of generating questions and answers by heuristics or crowd-sourcing, questions in RACE are specifically designed for testing human reading skills, and are created by domain experts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 8 rows and is filtered where the books is A race of female patriots : women and public spirit on the British stage, 1688-1745. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the New Britain Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of New Britain, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of New Britain.
Key observations
Among the Hispanic population in New Britain, regardless of the race, the largest group is of Puerto Rican origin, with a population of 25,104 (76.28% of the total Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Origin for Hispanic or Latino population include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for New Britain Population by Race & Ethnicity. You can refer the same here
Race distribution : Asians, Caucasians, black people
Gender distribution : gender balance
Age distribution : ranging from teenager to the elderly, the middle-aged and young people are the majorities
Collecting environment : including indoor and outdoor scenes
Data diversity : different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses
Device : cameras
Data format : the data format is .jpg/mp4, the annotation file format is .json, the camera parameter file format is .json, the point cloud file format is .pcd
Accuracy : based on the accuracy of the poses, the accuracy exceeds 97%;the accuracy of labels of gender, race, age, collecting environment and clothes are more than 97%
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).
Data size : 200,000 ID
Race distribution : black people, Caucasian people, brown(Mexican) people, Indian people and Asian people
Gender distribution : gender balance
Age distribution : young, midlife and senior
Collecting environment : including indoor and outdoor scenes
Data diversity : different face poses, races, ages, light conditions and scenes Device : cellphone
Data format : .jpg/png
Accuracy : the accuracy of labels of face pose, race, gender and age are more than 97%
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Supporting dataset using data from Census, Pay As You Earn (PAYE) and National Benefits Database. Tables contain data on earnings progression and geographic mobility from tax year ending 2012 to tax year ending 2016, broken down by characteristics such as age, sex, ethnicity, qualification level and local authority. The dataset also includes regression model output tables.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Contains all the current domains and measures of national well-being for young people. As well as providing the latest data for each measure, where available a time series of data are also presented along with useful links to data sources and other websites which may be of interest.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Dataset population: Persons
Ethnic group
Ethnic group classifies people according to their own perceived ethnic group and cultural background.
This topic contains ethnic group write-in responses without reference to the five broad ethnic group categories, e.g. all Irish people, irrespective of whether they are White, Mixed/multiple ethnic groups, Asian/Asian British, Black/African/Caribbean/Black British or Other ethnic group, are in the "Irish" response category. This topic was created as part of the commissioned table processing.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Model estimates of deaths involving the coronavirus (COVID-19) by ethnic group for people in private households in England.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
How people feel about their neighbourhood across the UK. This dataset shows how people feel about their neighbourhood by looking at 5 measures of social capital and shows differences observed between regions,constituent countries and urban and rural areas by ethnicity
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The mid-year estimates refer to the population on 30 June of the reference year and are produced in line with the standard United Nations (UN) definition for population estimates. They are the official set of population estimates for the UK and its constituent countries, the regions and counties of England, and local authorities and their equivalents.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the English Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the English language.
Dataset Contain & Diversity:Containing more than 2000 images, this English OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible English text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native English people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of English text recognition models.
Update & Custom Collection:We are committed to continually expanding this dataset by adding more images with the help of our native English crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the English language. Your journey to improved language understanding and processing begins here.
List of the data tables as part of the Immigration System Statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending March 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/68258d71aa3556876875ec80/passenger-arrivals-summary-mar-2025-tables.xlsx">Passenger arrivals summary tables, year ending March 2025 (MS Excel Spreadsheet, 66.5 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/681e406753add7d476d8187f/electronic-travel-authorisation-datasets-mar-2025.xlsx">Electronic travel authorisation detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 56.7 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/68247953b296b83ad5262ed7/visas-summary-mar-2025-tables.xlsx">Entry clearance visas summary tables, year ending March 2025 (MS Excel Spreadsheet, 113 KB)
https://assets.publishing.service.gov.uk/media/682c4241010c5c28d1c7e820/entry-clearance-visa-outcomes-datasets-mar-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 29.1 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional dat
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Survey data on smoking habits from the United Kingdom. The data set can be used for analyzing the demographic characteristics of smokers and types of tobacco consumed. A data frame with 1691 observations on the following 12 variables.
Column | Description |
---|---|
gender | Gender with levels Female and Male. |
age | Age. |
marital_status | Marital status with levels Divorced, Married, Separated, Single and Widowed. |
highest_qualification | Highest education level with levels A Levels, Degree, GCSE/CSE, GCSE/O Level, Higher/Sub Degree, No Qualification, ONC/BTEC and Other/Sub Degree |
nationality | Nationality with levels British, English, Irish, Scottish, Welsh, Other, Refused and Unknown. |
ethnicity | Ethnicity with levels Asian, Black, Chinese, Mixed, White and Refused Unknown. |
gross_income | Gross income with levels Under 2,600, 2,600 to 5,200, 5,200 to 10,400, 10,400 to 15,600, 15,600 to 20,800, 20,800 to 28,600, 28,600 to 36,400, Above 36,400, Refused and Unknown. |
region | Region with levels London, Midlands And East Anglia, Scotland, South East, South West, The North and Wales |
smoke | Smoking status with levels No and Yes |
amt_weekends | Number of cigarettes smoked per day on weekends. |
amt_weekdays | Number of cigarettes smoked per day on weekdays. |
type | Type of cigarettes smoked with levels Packets, Hand-Rolled, Both/Mainly Packets and Both/Mainly Hand-Rolled |
National STEM Centre, Large Datasets from stats4schools, https://www.stem.org.uk/resources/elibrary/resource/28452/large-datasets-stats4schools.
This data collection stems from work directly arising out of the project 'Unity out of Diversity? Perspectives on the adaptations of immigrants in Britain'. The main aim of the project was to examine perceptions of adaptation in academic, policy, and public spheres. The research generated new data in the form of: (1) focus groups conducted in Manchester and Glasgow between November 2014 and September 2015; (2) interviews with local and national 'policy stakeholders' conducted between January 2015 and September 2016. This data collection provides access to this new data and related documentation. The research also used existing data from various sources: (a) Existing surveys available via the UK Data Service such as: (1) Ethnic Minority British Election Study; (2) Citizenship Survey; and (3) Understanding Society. This data collection provides scripts that showed how the data was transformed for analysis. (b) Textual data from journal article abstracts; newspaper articles; and Hansard debates. This data collection provides details of the methodology used to extract such data. (c) Online survey data from a related project funded by the British Academy, where Dr Lessard-Phillips was a co-applicant (PI: Dr Maria Sobolewska). This data collection provides a replication dataset and related documentation.The adaptation of immigrants (the immigrants' long-term integration into British society, and British society's response to it) has become an important topic of academic inquiry and debate among policy makers and the general public. Yet there is little systematic research or unified understanding of this process within and across these different arenas. This project aims to investigate the commonalities and differences in the various perceptions and understandings of adaptation and try to reconnect them. This will be done by using an original research design that will examine: the multidimensionality of immigrant adaptation in British academia (via a meta-analysis of the current literature and quantitative analysis of secondary data). Which will be contrasted with the subjective understandings and perceptions of adaptation in Britain among: - policy makers and third-sector stakeholders (via an analysis of policy documents and interviews) - minority and majority groups among the British population (via focus groups). This project will seek an active involvement by academic and non-academic audiences. It will provide a thorough and updated understanding of immigrant adaptation and its dimensionality in Britain, reaching beyond academic and policy circles, with the aim to build a solid evidence base for future research and policy. The qualitative data was collected via focus groups with the members of the public in Manchester and Glasgow (using purposive samples provided by local community organisations), and interviews with policy stakeholders (people working in local and national goverment and third sector organisations) selected based on their expertise on the topic (either via direct sollicitation or adverts about the project).
This page contains data for the immigration system statistics up to March 2023.
For current immigration system data, visit ‘Immigration system statistics data tables’.
https://assets.publishing.service.gov.uk/media/6462567294f6df000cf5ea90/detention-datasets-mar-2023.xlsx">Immigration detention (MS Excel Spreadsheet, 9.8 MB)
Det_D01: Number of entries into immigration detention by nationality, age, sex and initial place of detention
Det_D02: Number of people in immigration detention at the end of each quarter by nationality, age, sex, current place of detention and length of detention
Det_D03: Number of occurrences of people leaving detention by nationality, age, sex, reason for leaving detention and length of detention
This is not the latest data
https://assets.publishing.service.gov.uk/media/646357c494f6df0010f5eb0a/returns-datasets-mar-2023.xlsx">Returns (MS Excel Spreadsheet, 14.4 MB)
Ret_D01: Number of returns from the UK, by nationality, age, sex, type of return and return destination group
Ret_D02: Number of returns from the UK, by type of return and country of destination
Ret_D03: Number of foreign national offender returns from the UK, by nationality and return destination group
Ret_D04: Number of foreign national offender returns from the UK, by destination
This is not the latest data
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Dataset population: Persons
Country of birth
Country of birth is the country in which a person was born. This topic records whether the person was born in or if they were not born in a country.
For the full country of birth classification in England and Wales, please see the National Statistics Country Classification.
Ethnic group
Ethnic group classifies people according to their own perceived ethnic group and cultural background.
This topic contains ethnic group write-in responses without reference to the five broad ethnic group categories, e.g. all Irish people, irrespective of whether they are White, Mixed/multiple ethnic groups, Asian/Asian British, Black/African/Caribbean/Black British or Other ethnic group, are in the 'Irish' response category. This topic was created as part of the commissioned table processing.
These statistics include:
We are currently unable to provide figures on matches made against profiles on the National DNA Database.
https://webarchive.nationalarchives.gov.uk/ukgwa/20230502153339/https://www.gov.uk/government/statistics/national-dna-database-statistics" class="govuk-link">Statistics from Q1 2013 to Q4 2022 to 2023 are available on the National Archives.
Figures for Q2 2014 to 2015 are unavailable. This is due to technical issues with the management information system.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the English Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
Dataset Content:
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the English language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native English speaking people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
Prompt Diversity:
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
Response Formats:
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled English Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy:
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The English version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy English Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
The ReAding Comprehension dataset from Examinations (RACE) dataset is a machine reading comprehension dataset consisting of 27,933 passages and 97,867 questions from English exams, targeting Chinese students aged 12-18. RACE consists of two subsets, RACE-M and RACE-H, from middle school and high school exams, respectively. RACE-M has 28,293 questions and RACE-H has 69,574. Each question is associated with 4 candidate answers, one of which is correct. The data generation process of RACE differs from most machine reading comprehension datasets - instead of generating questions and answers by heuristics or crowd-sourcing, questions in RACE are specifically designed for testing human reading skills, and are created by domain experts.