60 datasets found
  1. Most common non-English languages spoken in England and Wales 2021

    • statista.com
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common non-English languages spoken in England and Wales 2021 [Dataset]. https://www.statista.com/statistics/284010/most-common-non-english-languages-spoken-in-england-and-wales/
    Explore at:
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2021
    Area covered
    United Kingdom
    Description

    In 2021, there were 611,845 people who spoke Polish as a main language in England and Wales, the most common non-English language among the population. This was followed by Romanian, and Panjabi, which had 471,945 speakers and 290,745 speakers respectively.

  2. e

    Top Languages Spoken in London Boroughs and MSOAs

    • data.europa.eu
    unknown
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    census2011@london.gov.uk (2021). Top Languages Spoken in London Boroughs and MSOAs [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=ga
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jul 19, 2021
    Dataset authored and provided by
    census2011@london.gov.uk
    Area covered
    London
    Description

    This dataset shows the most spoken languages by borough and MSOAs in London. It provides numbers of the population aged 3+ who speak specified languages as their main language.

    Main language is from 2011 Census (detailed) - Census table QS204EW.

    This data is presented alongside Annual Population Survey (APS) data showing the top nationalities of residents in January - December 2019 by borough. The top 3 non-British nationalities are at the far right of the table. This is to highlight areas which may now have other common non-British languages spoken compared to 2011 (the year in which the Census information was gathered). The top non-British nationalities in 2019, which did not feature in 2011 as one of the most spoken non-British languages, are highlighted in column AD.

    The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. Estimates for non-British nationalities at borough level that are below 10,000 are considered too small to be reliable and should be treated with additional caution.

    MSOA codes have now been linked to House of Commons MSOA names

  3. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  4. Common languages used for web content 2025, by share of websites

    • statista.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  5. l

    Census 2021 - Main language

    • data.leicester.gov.uk
    csv, excel, json
    Updated Apr 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 2021 - Main language [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-2021-leicester-main-language-detailed/
    Explore at:
    excel, json, csvAvailable download formats
    Dataset updated
    Apr 25, 2023
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for Leicester and compare this with national statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsMain languageThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their main language. The estimates are as at Census Day, 21 March 2021.Main language is a person's first or preferred language. They may speak other languages as well. A main language is provided only for residents age 3 and above. Residents age below 3 years will appear as ‘Does not apply’. Please note that some organisations exclude those below 3 years when calculating percentages for this variable.This dataset contains information for Leicester City and England overall.

  6. England and Wales Census 2021 - RM080: Multi-language households by ethnic...

    • statistics.ukdataservice.ac.uk
    csv, json, xlsx
    Updated Jun 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2024). England and Wales Census 2021 - RM080: Multi-language households by ethnic group of Household Reference Person [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-rm080-multi-language-households-by-ethnic-group-of-hrp
    Explore at:
    xlsx, csv, jsonAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Northern Ireland Statistics and Research Agency
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    This dataset provides Census 2021 estimates that classify Household Reference Persons in England and Wales by whether one or multiple languages are spoken, and by ethnic group. The estimates are as at Census Day, 21 March 2021.

    Area type

    Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

    For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

    Lower tier local authorities

    Lower tier local authorities provide a range of local services. There are 309 lower tier local authorities in England made up of 181 non-metropolitan districts, 59 unitary authorities, 36 metropolitan districts and 33 London boroughs (including City of London). In Wales there are 22 local authorities made up of 22 unitary authorities.

    Coverage

    Census 2021 statistics are published for the whole of England and Wales. However, you can choose to filter areas by:

    • country - for example, Wales
    • region - for example, London
    • local authority - for example, Cornwall
    • health area – for example, Clinical Commissioning Group
    • statistical area - for example, MSOA or LSOA

    Multiple main languages in household

    Classifies households by whether members speak the same or different main language. If multiple main languages are spoken, this identifies whether they differ between generations or partnerships within the household.

    Ethnic group

    The ethnic group that the person completing the census feels they belong to. This could be based on their culture, family background, identity or physical appearance.

    Respondents could choose one out of 19 tick-box response categories, including write-in response options.

  7. b

    Percentage main language is not English: Cannot speak English - Birmingham...

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Percentage main language is not English: Cannot speak English - Birmingham Wards [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-cannot-speak-english-birmingham-wards/
    Explore at:
    excel, csv, json, geojsonAvailable download formats
    Dataset updated
    Sep 6, 2021
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Birmingham
    Description

    This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.

  8. F

    British English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). British English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-uk
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United Kingdom
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the UK English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world UK English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic British accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of UK English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native UK English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United Kingdom to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for UK English.
    Voice Assistants: Build smart assistants capable of understanding natural British conversations.

  9. Leading language learning apps by aided brand awareness in the UK in 2023

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading language learning apps by aided brand awareness in the UK in 2023 [Dataset]. https://www.statista.com/statistics/1490113/language-learning-app-awareness-uk/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United Kingdom
    Description

    In 2023, Duolingo led the language learning app market in the United Kingdom, achieving an aided brand awareness of 38.23 percent among consumers. Rosetta Stone followed with 29.84 percent, and Babbel reported 28.99 percent.

  10. Most common languages spoken in India 2011

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common languages spoken in India 2011 [Dataset]. https://www.statista.com/statistics/616508/most-common-languages-india/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2011
    Area covered
    India
    Description

    Hindi, with over *** million native speakers was the most spoken language across Indian homes, followed by Bengali with ** million speakers, as of 2011 census data. English native speakers accounted for about *** thousand during the measured time period. The colonial rule in India One of the most remarkable and widespread legacies that the British colonial rule left behind was the English language. Before independence, the English language was the solely used for higher education and in government and administrative processes. Post-independence, however, and till today, Hindi was claimed as the language with official government patronage. This lead to resistance from the southern states of India, where Hindi did not have prominence. Consequently, the Official Languages Act of 1963, was enacted by the parliament, which ensured the continued use of English for official purposes in conjunction with Hindi. Multi-linguistic cultures India has approximately ** major languages that are written in about ** different scripts. While the country’s official languages are both, English and Hindi, Hindi remains the most preferred language used online especially in the northern rural areas. The use of English is becoming increasingly popular in the urban areas. In addition, almost every state in India has its own official language that is studied in primary and secondary school as an obligatory second language. Among the most prominent are Bengali, Marathi, and Telugu.

  11. b

    Percentage main language is not English: Can speak English well - WMCA

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Percentage main language is not English: Can speak English well - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-can-speak-english-well-wmca/
    Explore at:
    json, csv, excel, geojsonAvailable download formats
    Dataset updated
    Sep 6, 2021
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.

  12. e

    Top Kalbos Kalbos Kalbos Londone Boroughs ir MSOA

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    census@london.gov.uk, Top Kalbos Kalbos Kalbos Londone Boroughs ir MSOA [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=lt
    Explore at:
    unknownAvailable download formats
    Dataset authored and provided by
    census@london.gov.uk
    Area covered
    Londonas
    Description

    Šis duomenų rinkinys rodo labiausiai kalbas pagal miesto ir MSOA Londone. Jame pateikiami skaičiai 3+ amžiaus gyventojų, kurie kalba konkrečiai_nbsp;languages kaip jų pagrindinę kalbą. Pagrindinė kalba yra nuo 2011 m. Surašymo (išsamiai) – Surašymo lentelė QS204EW.

    Šie duomenys pateikiami kartu su metiniu gyventojų tyrimu (APS) duomenimis, iš kurių matyti 2019 m. sausio-gruodžio mėn. gyventojų didžiausias tautybių skaičius pagal rajonus. Trys geriausi ne Didžiosios Britanijos piliečiai yra tolimiausioje dešinėje stalo pusėje. Taip siekiama atkreipti dėmesį į sritis, kuriose dabar, palyginti su 2011 m. (metais, kuriais buvo surinkta surašymo informacija), gali būti vartojamos ir kitos bendros ne britų kalbos. AD skiltyje paryškintos geriausios 2019 m. ne Didžiosios Britanijos pilietybės, kurios 2011 m. nebuvo viena iš dažniausiai vartojamų ne britų kalbų.

    APS turi aμnbsp;sample of around 320 000 žmonių Jungtinėje Karalystėje (apie 28,000 Londone). Todėl visi skaičiai turi būti vertinami atsargiai. Apylinkių, kurios nesiekia 10 000, įverčiai laikomi per mažais, kad būtų patikimi, ir turėtų būti vertinami papildomai atsargiai.

    MSOA kodai dabar susieti su Bendruomenių Rūmų MSOA pavadinimais

  13. British English Language Datasets | 150+ Years of Research | Natural...

    • datarade.ai
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). British English Language Datasets | 150+ Years of Research | Natural Language Processing (NLP) Data | LLMs | TTS | Dictionary Display | EU Coverage [Dataset]. https://datarade.ai/data-products/british-english-language-datasets-150-years-of-research-oxford-languages
    Explore at:
    .csv, .json, .mp3, .wav, .xls, .xmlAvailable download formats
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Oxford Languageshttps://www.lexico.com/
    Area covered
    United Kingdom
    Description

    Our British English language datasets are meticulously curated and annotated by experienced linguistics and language experts, ensuring exceptional accuracy, consistency, and linguistic depth. The below datasets in British English are available for license:

    1. British English Monolingual Dictionary Data
    2. British English Synonyms and Antonyms Data
    3. British English Pronunciations with Audio

    Key Features (approximate numbers):

    1. British English Monolingual Dictionary Data

    Our British English monolingual dataset delivers clear, reliable definitions and authentic usage examples, featuring a high volume of headwords and in-depth coverage of the British English variant of English. As one of the world’s most authoritative lexical resources, it’s trusted by leading academic, AI, and language technology organizations.

    • Headwords: 146,000
    • Senses: 230,000
    • Sentence examples: 149,000
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: twice a year
    1. British English Synonyms and Antonyms Data

    This British English language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for NLP tasks such as semantic search, word sense disambiguation, and language generation.

    • Synonyms: 600,000
    • Antonyms: 22,000
    • Usage Examples: 39,000
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing)
    • Updated frequency: annually
    1. British English Pronunciations with audio (word-level)

    This dataset provides IPA transcriptions and mapped audio files for words in contemporary British English, with a focus on UK speaker usage. It includes syllabified transcriptions, variant spellings, part-of-speech tags, and pronunciation group identifiers. Audio files are supplied separately and linked where available – ideal for TTS, ASR, and pronunciation modeling.

    • Transcriptions (IPA): 250,000
    • Audio files: 180,000
    • Format: XLSX (for transcriptions), MP3 and WAV (audio files)
    • Updated frequency: annually

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

  14. b

    Percentage main language is not English: Cannot speak English - WMCA Wards...

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Percentage main language is not English: Cannot speak English - WMCA Wards (2025) [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-main-language-is-not-english-cannot-speak-english-wmca-wards-2025/
    Explore at:
    json, geojson, csv, excelAvailable download formats
    Dataset updated
    Sep 6, 2021
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas.

    Data is Powered by LG Inform Plus and automatically checked for new data on the 4th of each month.

  15. Language of films released in the United Kingdom (UK) and Republic of...

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Language of films released in the United Kingdom (UK) and Republic of Ireland in 2019 [Dataset]. https://www.statista.com/statistics/296835/number-of-films-released-in-the-uk-and-ireland-by-language/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2019
    Area covered
    Ireland, United Kingdom
    Description

    This statistic displays the languages of films released in the United Kingdom and Republic of Ireland in 2019. Following English language movies and movies that featured English alongside extensive use of another language, Hindi language movies were second most common, followed by Spanish and Polish. In 2019, ** Hindi movies and ** Spanish movies were released.

  16. b

    Percentage main language is not English: Can speak English well - WMCA Wards...

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Sep 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Percentage main language is not English: Can speak English well - WMCA Wards (2025) [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-main-language-is-not-english-can-speak-english-well-wmca-wards-2025/
    Explore at:
    excel, geojson, json, csvAvailable download formats
    Dataset updated
    Sep 6, 2021
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas.

    Data is Powered by LG Inform Plus and automatically checked for new data on the 4th of each month.

  17. e

    Top jazyky mluvené v londýnských obvodech a MSOAs

    • data.europa.eu
    unknown
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    census@london.gov.uk (2025). Top jazyky mluvené v londýnských obvodech a MSOAs [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=cs
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    census@london.gov.uk
    Description

    Tento datový soubor ukazuje nejrozšířenější jazyky čtvrti a MSOA v Londýně. To poskytuje počet obyvatel ve věku 3+, kteří mluví specifikované, nbsp;jazyky anbsp;jejich hlavním jazykem. Hlavní jazyk je z 2011 sčítání lidu (podrobné) – sčítání lidu tabulka QS204EW. Tyto údaje jsou prezentovány spolu s daty Annual Population Survey (APS) ukazujícími nejvyšší národnost obyvatel v období od ledna do prosince 2019 podle čtvrtí. Top 3 non-britské národnosti jsou na pravé straně stolu. Cílem je upozornit na oblasti, v nichž se ve srovnání s rokem 2011 (rok, v němž byly shromážděny informace o sčítání lidu) hovoří i jiné běžné nebritské jazyky. Nejvyšší nebritské národnosti v roce 2019, které se v roce 2011 nevyskytovaly jako jeden z nejvíce mluvených nebritských jazyků, jsou zvýrazněny ve sloupci AD. APS má ve Velké Británii přibližně 320.000 lidí (přibližně 28 000 v Londýně). Jako takové je třeba se všemi údaji zacházet s určitou opatrností. Odhady nebritských národností na úrovni obvodu, které jsou nižší než 10 000, jsou považovány za příliš malé na to, aby byly spolehlivé, a měly by být ošetřeny s větší opatrností. Kódy MSOA jsou nyní propojeny s názvy MSOA House of Commons

  18. England and Wales Census 2021 - The international student population

    • statistics.ukdataservice.ac.uk
    xlsx
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2023). England and Wales Census 2021 - The international student population [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-the-international-student-population
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 10, 2023
    Dataset provided by
    Northern Ireland Statistics and Research Agency
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    England, Wales
    Description

    Census 2021 data on international student population of England and Wales by country of birth, passport held, age, sex and other characteristics.

    These datasets are part of the release: The changing picture of long-term international migration, England and Wales: Census 2021. Figures may differ slightly in future releases because of the impact of removing rounding and applying further statistical processes.

    Figures are based on geography boundaries as of 1 April 2022.

    This release includes comparisons to the folllowing 2011 Census data:

    Quality notes can be found here

    Quality information about demography and migration can be found here

    Quality information about labour market can be found here

    Usual resident

    A usual resident is anyone who on Census Day, 21 March 2021 was in the UK and had stayed or intended to stay in the UK for a period of 12 months or more, or had a permanent UK address and was outside the UK and intended to be outside the UK for less than 12 months.

    International student

    An international student is defined as someone who was a usual resident in England and Wales and meets all the following criteria:

    • in-full-time education
    • non-UK-born
    • non-UK passport holder
    • aged 17 years or over upon most recent arrival in the UK
    • aged 18 years or over on Census Day.

    Country of birth

    The country in which a person was born. The following country of birth classifications are used in this dataset:

    • Country of birth 12a: Political groupings of countries by EU membership and geographical location for non-EU countries.
    • Country of birth 190a: Individual countries. This classification includes geographical groupings for low volume countries.
    • Country of birth (3 categories): These categories have been derived from country of birth 12a and include all UK countries in "Europe: United Kingdom", all EU countries in "Europe: EU countries" and all remaining countries including British Overseas territories in "Non-EU countries (including British Overseas)".

    More information about country of birth classifications can be found here.

    Passports held

    The country or countries that a person holds, or is entitled to hold, a passport for. Where a person recorded having more than one passport, they were counted only once, categorised in the following priority order: 1. UK passport, 2. Irish passport, 3. Other passport. The following classifications were created for this dataset for comparability with other international migration releases:

    • Passports held (4 categories): High level political groupings of passport held by EU membership and geographical location for non-EU countries.
    • Passports held (12 categories): Political groupings of passport held by EU membership and geographical location for non-EU countries.
    • Passports held (150 categories): Individual countries for passport held. This classification includes geographical groupings for low volume countries.

    More information can be found here

    Economic activity status

    The economic activity status of a person on Census Day, 21 March 2021. The following classification is used in this dataset:

    Industry

    The industry worked in for those in current employment. The following classification was used for this dataset:

    Student accommodation

    Student accommodation breaks down household type by typical households used by students. This includes communal establishments, all student households, households containing a single family, households containing multiple families, living with parents and living alone.

    More information can be found here

    Second address indicator

    The second address indicator is used to define an address (in or out of the UK) a person stays at for more than 30 days per year that is not their place of usual residence. Second addresses typically include: armed forces bases, addresses used by people working away from home, a student’s home address, the address of another parent or guardian, a partner’s address, a holiday home. There are 3 categories in this classification.

    Detailed description can be found here

    Main language (detailed)

    This is used to define a person's first or preferred language. This breaks down the responses given in the write-in option "Other, write in (including British Sign Language)". There are 95 categories in the primary classification.

    More details can be found here

    Proficiency in English language

    Proficiency in English language is used to determine how well a person whose main language is not English (English or Welsh in Wales) feels they can speak English. There are a total number of 6 categories in this classification.

    More details can be found here

  19. Local areas with a non-English language as main language England and Wales...

    • statista.com
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Local areas with a non-English language as main language England and Wales 2021 [Dataset]. https://www.statista.com/statistics/329633/england-and-wales-local-areas-with-non-english-as-a-main-language/
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2021
    Area covered
    England, Wales, United Kingdom
    Description

    In 2021, the London borough of Newham had the highest share of residents that spoke a language other than English as their main language. Brent had the second-highest share of residents that had a different main language, followed by Ealing and Harrow, all also London boroughs. Outside of London, Leicester had the highest share of people who reported a language other than English as their main one, at 30 percent.

  20. e

    Ukrainian web corpus MaCoCu-uk 1.0 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Feb 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Ukrainian web corpus MaCoCu-uk 1.0 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/d7e90175-ec31-52de-94aa-b62ecac58611
    Explore at:
    Dataset updated
    Feb 13, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Ukraine, United Kingdom
    Description

    The Ukrainian web corpus MaCoCu-uk 1.0 was built by crawling the ".ua" and ".укр" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.com/macocu/MaCoCu-crawler. Considerable effort was devoted into cleaning the extracted text to provide a high-quality web corpus. This was achieved by removing boilerplate (https://corpus.tools/wiki/Justext) and near-duplicated paragraphs (https://corpus.tools/wiki/Onion), discarding very short texts as well as texts that are not in the target language. The dataset is characterized by extensive metadata which allows filtering the dataset based on text quality and other criteria (https://github.com/bitextor/monotextor), making the corpus highly useful for corpus linguistics studies, as well as for training language models and other language technologies. In XML format, each document is accompanied by the following metadata: title, crawl date, url, domain, file type of the original document, distribution of languages inside the document, and a fluency score based on a language model. The text of each document is divided into paragraphs that are accompanied by metadata on the information whether a paragraph is a heading or not, metadata on the paragraph quality (labels, such as “short” or “good”, assigned based on paragraph length, URL and stopword density via the jusText tool - https://corpus.tools/wiki/Justext) and fluency (score between 0 and 1, assigned with the Monocleaner tool - https://github.com/bitextor/monocleaner), the automatically identified language of the text in the paragraph, and information whether the paragraph contains sensitive information (identified via the Biroamer tool - https://github.com/bitextor/biroamer). The corpus can be easily read with the prevert parser (https://pypi.org/project/prevert/). Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains. A newer version of the corpus is available as part of the MaCoCu-Genre corpora collection at http://hdl.handle.net/11356/1969. The main novelty of the MaCoCu-Genre version is that the texts have been automatically annotated with genre categories. Additionally, the corpus underwent additional post-processing and has been transformed to the JSONL format.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Most common non-English languages spoken in England and Wales 2021 [Dataset]. https://www.statista.com/statistics/284010/most-common-non-english-languages-spoken-in-england-and-wales/
Organization logo

Most common non-English languages spoken in England and Wales 2021

Explore at:
Dataset updated
Jun 13, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
United Kingdom
Description

In 2021, there were 611,845 people who spoke Polish as a main language in England and Wales, the most common non-English language among the population. This was followed by Romanian, and Panjabi, which had 471,945 speakers and 290,745 speakers respectively.

Search
Clear search
Close search
Google apps
Main menu