Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all MSOAs and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the MSOAs of Leicester city.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the UK English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world UK English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic British accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of UK English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple English speech and language AI applications:
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all wards and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the electoral wards of Leicester city.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify usual residents in Wales aged 3 years and over in Wales by ability to speak Welsh, by national identity, and by age. The estimates are as at Census Day, 21 March 2021.
The increase since the 2011 Census in people identifying as “British” and fall in people identifying as “English” may partly reflect true changes in self-perception. It is also likely to reflect that “British” replaced “English” as the first response option listed on the questionnaire in England. Read more about this quality notice.
Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Coverage
Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:
Welsh speaking ability
This classifies a person as being able to "Speak Welsh". They may have also ticked one or more of the following:
In results that classify people by Welsh language skills, a person may appear in more than one category depending on which combination of skills they have.
National identity
Someone’s national identity is a self-determined assessment of their own identity, it could be the country or countries where they feel they belong or think of as home. It is not dependent on ethnic group or citizenship.
Respondents could select more than one national identity.
Age (B)
A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age. Age is categorised as follows:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the UK English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
This visual speech dataset contains 1000 videos in UK English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
The dataset provides comprehensive metadata for each video recording and participant:
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Card for "english_dialects"
Dataset Summary
This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This UK English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native UK English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Coverage
Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:
Proficiency in English language (6 categories)
How well people whose main language is not English (English or Welsh in Wales) speak English.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Dataset population: Persons aged 3 and over
Age
Age is derived from the date of birth question and is a person's age at their last birthday, at 27 March 2011. Dates of birth that imply an age over 115 are treated as invalid and the person's age is imputed. Infants less than one year old are classified as 0 years of age.
General health
General health is a self-assessment of a person's general state of health. People were asked to assess whether their health was very good, good, fair, bad or very bad.
For England and Wales, this assessment is not based on a person's health over any specified period of time.
Proficiency in English
Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:
This question was handled slightly differently in the England and Wales censuses.
In the English census a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English' or 'Other'.
In the Welsh census, a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English or Welsh' or 'Other'.
Those who ticked 'Other' would be asked about their ability to speak English.
A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.
Copies of the census forms can be found here: UK census forms.
Sex
The classification of a person as either male or female.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Dataset population: Persons aged 3 and over
Age
Age is derived from the date of birth question and is a person's age at their last birthday, at 27 March 2011. Dates of birth that imply an age over 115 are treated as invalid and the person's age is imputed. Infants less than one year old are classified as 0 years of age.
Proficiency in English
Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:
This question was handled slightly differently in the England and Wales censuses.
In the English census a tick box was used in Question 18, asking "What is your main language?", giving the option of 'English' or 'Other'.
In the Welsh census, a tick box was used in Question 18, asking "What is your main language?", giving the option of 'English or Welsh' or 'Other'.
Those who ticked 'Other' would be asked about their ability to speak English.
A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.
Copies of the census forms can be found here: UK census forms.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Dataset population: Persons aged 3 and over
Age upon arrival in the UK
The age of arrival in the UK is derived from the date that a person last arrived to live in the UK and their age. Short visits away from the UK are not counted in determining the date that a person last arrived.
Age of arrival is only applicable to usual residents who were not born in the UK. It does not include usual residents born in the UK who have emigrated and since returned; these are recorded in the category 'Born in the UK'.
Proficiency in English
Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:
This question was handled slightly differently in the England and Wales censuses.
In the English census a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English' or 'Other'.
In the Welsh census, a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English or Welsh' or 'Other'.
Those who ticked 'Other' would be asked about their ability to speak English.
A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.
Copies of the census forms can be found here: UK census forms.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify usual residents aged 3 years and over in England and Wales by proficiency in English and by age. The estimates are as at Census Day, 21 March 2021.
Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Lower tier local authorities
Lower tier local authorities provide a range of local services. There are 309 lower tier local authorities in England made up of 181 non-metropolitan districts, 59 unitary authorities, 36 metropolitan districts and 33 London boroughs (including City of London). In Wales there are 22 local authorities made up of 22 unitary authorities.
Coverage
Census 2021 statistics are published for the whole of England and Wales. However, you can choose to filter areas by:
Proficiency in English language
How well people whose main language is not English (English or Welsh in Wales) speak English.
Age
A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This UK English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.
The dataset contains 30 hours of dual-channel call center recordings between native UK English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.
This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.
Rich metadata is available for each participant and conversation:
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Dataset population: Persons aged 16 and over
Age
Age is derived from the date of birth question and is a person's age at their last birthday, at 27 March 2011. Dates of birth that imply an age over 115 are treated as invalid and the person's age is imputed. Infants less than one year old are classified as 0 years of age.
Occupation
A person's occupation relates to their main job and is derived from either their job title or details of the activities involved in their job. This is used to assign responses to an occupation code based on the Standard Occupational Classification 2010 (SOC2010).
Proficiency in English
Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:
This question was handled slightly differently in the England and Wales censuses.
In the English census a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English' or 'Other'.
In the Welsh census, a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English or Welsh' or 'Other'.
Those who ticked 'Other' would be asked about their ability to speak English.
A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.
Copies of the census forms can be found here: UK census forms.
Sex
The classification of a person as either male or female.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2021 estimates that classify usual residents aged 16 years and over in England and Wales by proficiency in English and by economic activity status. The estimates are as at Census Day, 21 March 2021.
As Census 2021 was during a unique period of rapid change, take care when using this data for planning purposes. Read more about this quality notice.
Area type
Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.
For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.
Lower tier local authorities
Lower tier local authorities provide a range of local services. There are 309 lower tier local authorities in England made up of 181 non-metropolitan districts, 59 unitary authorities, 36 metropolitan districts and 33 London boroughs (including City of London). In Wales there are 22 local authorities made up of 22 unitary authorities.
Coverage
Census 2021 statistics are published for the whole of England and Wales. However, you can choose to filter areas by:
Proficiency in English language
How well people whose main language is not English (English or Welsh in Wales) speak English.
Economic activity status
People aged 16 years and over are economically active if, between 15 March and 21 March 2021, they were:
It is a measure of whether or not a person was an active participant in the labour market during this period. Economically inactive are those aged 16 years and over who did not have a job between 15 March to 21 March 2021 and had not looked for work between 22 February to 21 March 2021 or could not start work within two weeks.
The census definition differs from International Labour Organization definition used on the Labour Force Survey, so estimates are not directly comparable.
This classification splits out full-time students from those who are not full-time students when they are employed or unemployed. It is recommended to sum these together to look at all of those in employment or unemployed, or to use the four category labour market classification, if you want to look at all those with a particular labour market status.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides Census 2022 estimates for the English language skills by Individuals in Scotland.
A classification of a persons skills in the English Language. It breaks down into combinations of "Understand (spoken)", "Speak", "Read" and "Write".
Details of classification can be found here
The quality assurance report can be found here
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Census 2021 data on international student population of England and Wales by country of birth, passport held, age, sex and other characteristics.
These datasets are part of the release: The changing picture of long-term international migration, England and Wales: Census 2021. Figures may differ slightly in future releases because of the impact of removing rounding and applying further statistical processes.
Figures are based on geography boundaries as of 1 April 2022.
This release includes comparisons to the folllowing 2011 Census data:
Quality notes can be found here
Quality information about demography and migration can be found here
Quality information about labour market can be found here
Usual resident
A usual resident is anyone who on Census Day, 21 March 2021 was in the UK and had stayed or intended to stay in the UK for a period of 12 months or more, or had a permanent UK address and was outside the UK and intended to be outside the UK for less than 12 months.
International student
An international student is defined as someone who was a usual resident in England and Wales and meets all the following criteria:
Country of birth
The country in which a person was born. The following country of birth classifications are used in this dataset:
More information about country of birth classifications can be found here.
Passports held
The country or countries that a person holds, or is entitled to hold, a passport for. Where a person recorded having more than one passport, they were counted only once, categorised in the following priority order: 1. UK passport, 2. Irish passport, 3. Other passport. The following classifications were created for this dataset for comparability with other international migration releases:
More information can be found here
Economic activity status
The economic activity status of a person on Census Day, 21 March 2021. The following classification is used in this dataset:
Industry
The industry worked in for those in current employment. The following classification was used for this dataset:
Student accommodation
Student accommodation breaks down household type by typical households used by students. This includes communal establishments, all student households, households containing a single family, households containing multiple families, living with parents and living alone.
More information can be found here
Second address indicator
The second address indicator is used to define an address (in or out of the UK) a person stays at for more than 30 days per year that is not their place of usual residence. Second addresses typically include: armed forces bases, addresses used by people working away from home, a student’s home address, the address of another parent or guardian, a partner’s address, a holiday home. There are 3 categories in this classification.
Detailed description can be found here
Main language (detailed)
This is used to define a person's first or preferred language. This breaks down the responses given in the write-in option "Other, write in (including British Sign Language)". There are 95 categories in the primary classification.
More details can be found here
Proficiency in English language
Proficiency in English language is used to determine how well a person whose main language is not English (English or Welsh in Wales) feels they can speak English. There are a total number of 6 categories in this classification.
More details can be found here
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all MSOAs and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the MSOAs of Leicester city.