18 datasets found
  1. l

    Census 21 - English proficiency MSOA

    • data.leicester.gov.uk
    csv, excel, geojson +1
    Updated Aug 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 21 - English proficiency MSOA [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-21-english-proficiency-msoa/
    Explore at:
    csv, geojson, excel, jsonAvailable download formats
    Dataset updated
    Aug 22, 2023
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all MSOAs and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the MSOAs of Leicester city.

  2. b

    Percentage main language is not English: Cannot speak English - Birmingham...

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Percentage main language is not English: Cannot speak English - Birmingham Constituency [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-cannot-speak-english-birmingham-constituency/
    Explore at:
    excel, json, csv, geojsonAvailable download formats
    Dataset updated
    Sep 6, 2021
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Birmingham
    Description

    This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.

  3. F

    British English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). British English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United Kingdom
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the UK English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world UK English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic British accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of UK English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native UK English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United Kingdom to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for UK English.
    Voice Assistants: Build smart assistants capable of understanding natural British conversations.

  4. l

    Census 21 - English proficiency ward

    • data.leicester.gov.uk
    csv, excel, geojson +1
    Updated Jun 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 21 - English proficiency ward [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-21-english-proficiency-ward/
    Explore at:
    json, geojson, excel, csvAvailable download formats
    Dataset updated
    Jun 26, 2023
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all wards and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the electoral wards of Leicester city.

  5. England and Wales Census 2021 - RM150: Ability to speak Welsh by national...

    • statistics.ukdataservice.ac.uk
    csv, json, xlsx
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2024). England and Wales Census 2021 - RM150: Ability to speak Welsh by national identity by age [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-rm150-ability-to-speak-welsh-by-national-identity-by-age
    Explore at:
    xlsx, json, csvAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    Northern Ireland Statistics and Research Agency
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    This dataset provides Census 2021 estimates that classify usual residents in Wales aged 3 years and over in Wales by ability to speak Welsh, by national identity, and by age. The estimates are as at Census Day, 21 March 2021.

    The increase since the 2011 Census in people identifying as “British” and fall in people identifying as “English” may partly reflect true changes in self-perception. It is also likely to reflect that “British” replaced “English” as the first response option listed on the questionnaire in England. Read more about this quality notice.

    Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.

    Area type

    Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

    For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

    Coverage

    Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:

    • country - for example, Wales
    • region - for example, London
    • local authority - for example, Cornwall
    • health area – for example, Clinical Commissioning Group
    • statistical area - for example, MSOA or LSOA

    Welsh speaking ability

    This classifies a person as being able to "Speak Welsh". They may have also ticked one or more of the following:

    • understand spoken Welsh
    • read Welsh
    • write Welsh

    In results that classify people by Welsh language skills, a person may appear in more than one category depending on which combination of skills they have.

    National identity

    Someone’s national identity is a self-determined assessment of their own identity, it could be the country or countries where they feel they belong or think of as home. It is not dependent on ethnic group or citizenship.

    Respondents could select more than one national identity.

    Age (B)

    A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age. Age is categorised as follows:

    • Aged 15 years and under
    • Aged 16 to 49 years
    • Aged 50 years and over
  6. F

    Audio Visual Speech Dataset: British English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Audio Visual Speech Dataset: British English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/british-english-visual-speech-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the UK English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

    Dataset Content

    This visual speech dataset contains 1000 videos in UK English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

    Participant Diversity:
    Speakers: The dataset includes visual speech data from more than 200 participants from different regions of United Kingdom.
    Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Video Data

    While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

    Recording Details:
    File Duration: Average duration of 30 seconds to 3 minutes per video.
    Formats: Videos are available in MP4 or MOV format.
    Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
    Device: Both the latest Android and iOS devices are used in this collection.
    Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
    Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
    Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
    Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
    Face Orientation: Contains straight face and tilted face angles.
    Participant Positions: Records participants in both standing and seated positions.
    Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
    Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
    Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
    Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
    Happy
    Sad
    Excited
    Angry
    Annoyed
    Normal
    Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

    Metadata

    The dataset provides comprehensive metadata for each video recording and participant:

  7. h

    english_dialects

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoach Lacombe, english_dialects [Dataset]. https://huggingface.co/datasets/ylacombe/english_dialects
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Yoach Lacombe
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for "english_dialects"

      Dataset Summary
    

    This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.

  8. F

    British English Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). British English Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United Kingdom
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This UK English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native UK English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native UK English contributors from our verified pool.
    Regions: Covering multiple United Kingdom provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train English speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left:

  9. England and Wales Census 2021 - TS029: Proficiency in English

    • statistics.ukdataservice.ac.uk
    csv, json, xlsx
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2024). England and Wales Census 2021 - TS029: Proficiency in English [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-ts029-proficiency-in-english
    Explore at:
    xlsx, csv, jsonAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    Northern Ireland Statistics and Research Agency
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    This dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.

    Area type

    Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

    For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

    Coverage

    Census 2021 statistics are published for the whole of England and Wales. Data are also available in these geographic types:

    • country - for example, Wales
    • region - for example, London
    • local authority - for example, Cornwall
    • health area – for example, Clinical Commissioning Group
    • statistical area - for example, MSOA or LSOA

    Proficiency in English language (6 categories)

    How well people whose main language is not English (English or Welsh in Wales) speak English.

  10. Age by General health by Proficiency in English by Sex (England and Wales)...

    • statistics.ukdataservice.ac.uk
    csv, zip
    Updated Sep 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2022). Age by General health by Proficiency in English by Sex (England and Wales) 2011 [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/age-general-health-proficiency-english-sex-england-and-wales-2011
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    Dataset population: Persons aged 3 and over

    Age

    Age is derived from the date of birth question and is a person's age at their last birthday, at 27 March 2011. Dates of birth that imply an age over 115 are treated as invalid and the person's age is imputed. Infants less than one year old are classified as 0 years of age.

    General health

    General health is a self-assessment of a person's general state of health. People were asked to assess whether their health was very good, good, fair, bad or very bad.

    For England and Wales, this assessment is not based on a person's health over any specified period of time.

    Proficiency in English

    Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:

    • Can speak English very well
    • Can speak English well
    • Cannot speak English well
    • Cannot speak English

    This question was handled slightly differently in the England and Wales censuses.

    In the English census a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English' or 'Other'.

    In the Welsh census, a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English or Welsh' or 'Other'.

    Those who ticked 'Other' would be asked about their ability to speak English.

    A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.

    Copies of the census forms can be found here: UK census forms.

    Sex

    The classification of a person as either male or female.

  11. Age by Proficiency in English 2011

    • statistics.ukdataservice.ac.uk
    csv, zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2022). Age by Proficiency in English 2011 [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/age-proficiency-english-2011
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Dataset population: Persons aged 3 and over

    Age

    Age is derived from the date of birth question and is a person's age at their last birthday, at 27 March 2011. Dates of birth that imply an age over 115 are treated as invalid and the person's age is imputed. Infants less than one year old are classified as 0 years of age.

    Proficiency in English

    Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:

    • Can speak English very well
    • Can speak English well
    • Cannot speak English well
    • Cannot speak English

    This question was handled slightly differently in the England and Wales censuses.

    In the English census a tick box was used in Question 18, asking "What is your main language?", giving the option of 'English' or 'Other'.

    In the Welsh census, a tick box was used in Question 18, asking "What is your main language?", giving the option of 'English or Welsh' or 'Other'.

    Those who ticked 'Other' would be asked about their ability to speak English.

    A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.

    Copies of the census forms can be found here: UK census forms.

  12. Age upon arrival in the UK by Proficiency in English (Great Britain) 2011

    • statistics.ukdataservice.ac.uk
    csv, zip
    Updated Sep 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2022). Age upon arrival in the UK by Proficiency in English (Great Britain) 2011 [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/age-upon-arrival-uk-proficiency-english-great-britain-2011
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Great Britain, United Kingdom
    Description

    Dataset population: Persons aged 3 and over

    Age upon arrival in the UK

    The age of arrival in the UK is derived from the date that a person last arrived to live in the UK and their age. Short visits away from the UK are not counted in determining the date that a person last arrived.

    Age of arrival is only applicable to usual residents who were not born in the UK. It does not include usual residents born in the UK who have emigrated and since returned; these are recorded in the category 'Born in the UK'.

    Proficiency in English

    Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:

    • Can speak English very well
    • Can speak English well
    • Cannot speak English well
    • Cannot speak English

    This question was handled slightly differently in the England and Wales censuses.

    In the English census a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English' or 'Other'.

    In the Welsh census, a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English or Welsh' or 'Other'.

    Those who ticked 'Other' would be asked about their ability to speak English.

    A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.

    Copies of the census forms can be found here: UK census forms.

  13. England and Wales Census 2021 - RM111: Proficiency in English by age

    • statistics.ukdataservice.ac.uk
    csv, json, xlsx
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2024). England and Wales Census 2021 - RM111: Proficiency in English by age [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-rm111-proficiency-in-english-by-age
    Explore at:
    xlsx, csv, jsonAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    Northern Ireland Statistics and Research Agency
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    This dataset provides Census 2021 estimates that classify usual residents aged 3 years and over in England and Wales by proficiency in English and by age. The estimates are as at Census Day, 21 March 2021.

    Estimates for single year of age between ages 90 and 100+ are less reliable than other ages. Estimation and adjustment at these ages was based on the age range 90+ rather than five-year age bands. Read more about this quality notice.

    Area type

    Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

    For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

    Lower tier local authorities

    Lower tier local authorities provide a range of local services. There are 309 lower tier local authorities in England made up of 181 non-metropolitan districts, 59 unitary authorities, 36 metropolitan districts and 33 London boroughs (including City of London). In Wales there are 22 local authorities made up of 22 unitary authorities.

    Coverage

    Census 2021 statistics are published for the whole of England and Wales. However, you can choose to filter areas by:

    • country - for example, Wales
    • region - for example, London
    • local authority - for example, Cornwall
    • health area – for example, Clinical Commissioning Group
    • statistical area - for example, MSOA or LSOA

    Proficiency in English language

    How well people whose main language is not English (English or Welsh in Wales) speak English.

    Age

    A person’s age on Census Day, 21 March 2021 in England and Wales. Infants aged under 1 year are classified as 0 years of age.

  14. F

    British English Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). British English Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This UK English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native UK English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native UK English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across United Kingdom to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  15. Age by Occupation by Proficiency in English by Sex (Middle Super Output...

    • statistics.ukdataservice.ac.uk
    csv, zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2022). Age by Occupation by Proficiency in English by Sex (Middle Super Output Areas in England and Wales) 2011 [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/age-occupation-proficiency-english-sex-middle-super-output-areas-england-and-wales-2011
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    Dataset population: Persons aged 16 and over

    Age

    Age is derived from the date of birth question and is a person's age at their last birthday, at 27 March 2011. Dates of birth that imply an age over 115 are treated as invalid and the person's age is imputed. Infants less than one year old are classified as 0 years of age.

    Occupation

    A person's occupation relates to their main job and is derived from either their job title or details of the activities involved in their job. This is used to assign responses to an occupation code based on the Standard Occupational Classification 2010 (SOC2010).

    Proficiency in English

    Proficiency in English language classifies people whose main language is not English (or not English or Welsh in Wales) according to their ability to speak English. A person is classified in one of the categories:

    • Can speak English very well
    • Can speak English well
    • Cannot speak English well
    • Cannot speak English

    This question was handled slightly differently in the England and Wales censuses.

    In the English census a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English' or 'Other'.

    In the Welsh census, a tick box was used in Question 18, asking 'What is your main language?', giving the option of 'English or Welsh' or 'Other'.

    Those who ticked 'Other' would be asked about their ability to speak English.

    A consequence of this is that a person who reports their main language to be Welsh and completed the Welsh census, will not be asked about their ability to speak English. Whereas a person who indicates that their main language is Welsh and lives in England would be asked about 'their ability to speak English'.

    Copies of the census forms can be found here: UK census forms.

    Sex

    The classification of a person as either male or female.

  16. England and Wales Census 2021 - RM112: Proficiency in English by economic...

    • statistics.ukdataservice.ac.uk
    csv, json, xlsx
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2024). England and Wales Census 2021 - RM112: Proficiency in English by economic activity status [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-rm112-proficiency-in-english-by-economic-activity-status
    Explore at:
    xlsx, csv, jsonAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    Northern Ireland Statistics and Research Agency
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    This dataset provides Census 2021 estimates that classify usual residents aged 16 years and over in England and Wales by proficiency in English and by economic activity status. The estimates are as at Census Day, 21 March 2021.

    As Census 2021 was during a unique period of rapid change, take care when using this data for planning purposes. Read more about this quality notice.

    Area type

    Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

    For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

    Lower tier local authorities

    Lower tier local authorities provide a range of local services. There are 309 lower tier local authorities in England made up of 181 non-metropolitan districts, 59 unitary authorities, 36 metropolitan districts and 33 London boroughs (including City of London). In Wales there are 22 local authorities made up of 22 unitary authorities.

    Coverage

    Census 2021 statistics are published for the whole of England and Wales. However, you can choose to filter areas by:

    • country - for example, Wales
    • region - for example, London
    • local authority - for example, Cornwall
    • health area – for example, Clinical Commissioning Group
    • statistical area - for example, MSOA or LSOA

    Proficiency in English language

    How well people whose main language is not English (English or Welsh in Wales) speak English.

    Economic activity status

    People aged 16 years and over are economically active if, between 15 March and 21 March 2021, they were:

    • in employment (an employee or self-employed)
    • unemployed, but looking for work and could start within two weeks
    • unemployed, but waiting to start a job that had been offered and accepted

    It is a measure of whether or not a person was an active participant in the labour market during this period. Economically inactive are those aged 16 years and over who did not have a job between 15 March to 21 March 2021 and had not looked for work between 22 February to 21 March 2021 or could not start work within two weeks.

    The census definition differs from International Labour Organization definition used on the Labour Force Survey, so estimates are not directly comparable.

    This classification splits out full-time students from those who are not full-time students when they are employed or unemployed. It is recommended to sum these together to look at all of those in employment or unemployed, or to use the four category labour market classification, if you want to look at all those with a particular labour market status.

  17. U

    Scotland's Census 2022 - UV210 - English language skills

    • statistics.ukdataservice.ac.uk
    csv
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Records of Scotland (2024). Scotland's Census 2022 - UV210 - English language skills [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/scotland-s-census-2022-uv210-english-language-skills
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset authored and provided by
    National Records of Scotland
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Scotland
    Description

    This dataset provides Census 2022 estimates for the English language skills by Individuals in Scotland.

    English language skills

    A classification of a persons skills in the English Language. It breaks down into combinations of "Understand (spoken)", "Speak", "Read" and "Write".

    Details of classification can be found here

    The quality assurance report can be found here

  18. England and Wales Census 2021 - The international student population

    • statistics.ukdataservice.ac.uk
    xlsx
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2023). England and Wales Census 2021 - The international student population [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-the-international-student-population
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 10, 2023
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    Northern Ireland Statistics and Research Agency
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    Census 2021 data on international student population of England and Wales by country of birth, passport held, age, sex and other characteristics.

    These datasets are part of the release: The changing picture of long-term international migration, England and Wales: Census 2021. Figures may differ slightly in future releases because of the impact of removing rounding and applying further statistical processes.

    Figures are based on geography boundaries as of 1 April 2022.

    This release includes comparisons to the folllowing 2011 Census data:

    Quality notes can be found here

    Quality information about demography and migration can be found here

    Quality information about labour market can be found here

    Usual resident

    A usual resident is anyone who on Census Day, 21 March 2021 was in the UK and had stayed or intended to stay in the UK for a period of 12 months or more, or had a permanent UK address and was outside the UK and intended to be outside the UK for less than 12 months.

    International student

    An international student is defined as someone who was a usual resident in England and Wales and meets all the following criteria:

    • in-full-time education
    • non-UK-born
    • non-UK passport holder
    • aged 17 years or over upon most recent arrival in the UK
    • aged 18 years or over on Census Day.

    Country of birth

    The country in which a person was born. The following country of birth classifications are used in this dataset:

    • Country of birth 12a: Political groupings of countries by EU membership and geographical location for non-EU countries.
    • Country of birth 190a: Individual countries. This classification includes geographical groupings for low volume countries.
    • Country of birth (3 categories): These categories have been derived from country of birth 12a and include all UK countries in "Europe: United Kingdom", all EU countries in "Europe: EU countries" and all remaining countries including British Overseas territories in "Non-EU countries (including British Overseas)".

    More information about country of birth classifications can be found here.

    Passports held

    The country or countries that a person holds, or is entitled to hold, a passport for. Where a person recorded having more than one passport, they were counted only once, categorised in the following priority order: 1. UK passport, 2. Irish passport, 3. Other passport. The following classifications were created for this dataset for comparability with other international migration releases:

    • Passports held (4 categories): High level political groupings of passport held by EU membership and geographical location for non-EU countries.
    • Passports held (12 categories): Political groupings of passport held by EU membership and geographical location for non-EU countries.
    • Passports held (150 categories): Individual countries for passport held. This classification includes geographical groupings for low volume countries.

    More information can be found here

    Economic activity status

    The economic activity status of a person on Census Day, 21 March 2021. The following classification is used in this dataset:

    Industry

    The industry worked in for those in current employment. The following classification was used for this dataset:

    Student accommodation

    Student accommodation breaks down household type by typical households used by students. This includes communal establishments, all student households, households containing a single family, households containing multiple families, living with parents and living alone.

    More information can be found here

    Second address indicator

    The second address indicator is used to define an address (in or out of the UK) a person stays at for more than 30 days per year that is not their place of usual residence. Second addresses typically include: armed forces bases, addresses used by people working away from home, a student’s home address, the address of another parent or guardian, a partner’s address, a holiday home. There are 3 categories in this classification.

    Detailed description can be found here

    Main language (detailed)

    This is used to define a person's first or preferred language. This breaks down the responses given in the write-in option "Other, write in (including British Sign Language)". There are 95 categories in the primary classification.

    More details can be found here

    Proficiency in English language

    Proficiency in English language is used to determine how well a person whose main language is not English (English or Welsh in Wales) feels they can speak English. There are a total number of 6 categories in this classification.

    More details can be found here

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2023). Census 21 - English proficiency MSOA [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-21-english-proficiency-msoa/

Census 21 - English proficiency MSOA

Explore at:
csv, geojson, excel, jsonAvailable download formats
Dataset updated
Aug 22, 2023
License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for all MSOAs and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsProficiency in EnglishThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their proficiency in English. The estimates are as at Census Day, 21 March 2021.Definition: How well people whose main language is not English (English or Welsh in Wales) speak English.This dataset provides details for the MSOAs of Leicester city.

Search
Clear search
Close search
Google apps
Main menu