60 datasets found

Most common non-English languages spoken in England and Wales 2021
statista.com
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most common non-English languages spoken in England and Wales 2021 [Dataset]. https://www.statista.com/statistics/284010/most-common-non-english-languages-spoken-in-england-and-wales/
Explore at:
Dataset updated
Jun 13, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
United Kingdom
Description
In 2021, there were 611,845 people who spoke Polish as a main language in England and Wales, the most common non-English language among the population. This was followed by Romanian, and Panjabi, which had 471,945 speakers and 290,745 speakers respectively.
e
Top Languages Spoken in London Boroughs and MSOAs
data.europa.eu
unknown
Updated Jul 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
census2011@london.gov.uk (2021). Top Languages Spoken in London Boroughs and MSOAs [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=ga
Explore at:
unknownAvailable download formats
Dataset updated
Jul 19, 2021
Dataset authored and provided by
census2011@london.gov.uk
Area covered
London
Description
This dataset shows the most spoken languages by borough and MSOAs in London. It provides numbers of the population aged 3+ who speak specified languages as their main language.

Main language is from 2011 Census (detailed) - Census table QS204EW.

This data is presented alongside Annual Population Survey (APS) data showing the top nationalities of residents in January - December 2019 by borough. The top 3 non-British nationalities are at the far right of the table. This is to highlight areas which may now have other common non-British languages spoken compared to 2011 (the year in which the Census information was gathered). The top non-British nationalities in 2019, which did not feature in 2011 as one of the most spoken non-British languages, are highlighted in column AD.

The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. Estimates for non-British nationalities at borough level that are below 10,000 are considered too small to be reliable and should be treated with additional caution.

MSOA codes have now been linked to House of Commons MSOA names
The most spoken languages worldwide 2025
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
Common languages used for web content 2025, by share of websites
statista.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Explore at:
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
l
Census 2021 - Main language
data.leicester.gov.uk
csv, excel, json
Updated Apr 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Census 2021 - Main language [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-2021-leicester-main-language-detailed/
Explore at:
excel, json, csvAvailable download formats
Dataset updated
Apr 25, 2023
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for Leicester and compare this with national statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsMain languageThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their main language. The estimates are as at Census Day, 21 March 2021.Main language is a person's first or preferred language. They may speak other languages as well. A main language is provided only for residents age 3 and above. Residents age below 3 years will appear as ‘Does not apply’. Please note that some organisations exclude those below 3 years when calculating percentages for this variable.This dataset contains information for Leicester City and England overall.
England and Wales Census 2021 - RM080: Multi-language households by ethnic...
statistics.ukdataservice.ac.uk
csv, json, xlsx
Updated Jun 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2024). England and Wales Census 2021 - RM080: Multi-language households by ethnic group of Household Reference Person [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-rm080-multi-language-households-by-ethnic-group-of-hrp
Explore at:
xlsx, csv, jsonAvailable download formats
Dataset updated
Jun 10, 2024
Dataset provided by
Northern Ireland Statistics and Research Agency
Office for National Statisticshttp://www.ons.gov.uk/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
England, Wales
Description
This dataset provides Census 2021 estimates that classify Household Reference Persons in England and Wales by whether one or multiple languages are spoken, and by ethnic group. The estimates are as at Census Day, 21 March 2021.

Area type

Census 2021 statistics are published for a number of different geographies. These can be large, for example the whole of England, or small, for example an output area (OA), the lowest level of geography for which statistics are produced.

For higher levels of geography, more detailed statistics can be produced. When a lower level of geography is used, such as output areas (which have a minimum of 100 persons), the statistics produced have less detail. This is to protect the confidentiality of people and ensure that individuals or their characteristics cannot be identified.

Lower tier local authorities

Lower tier local authorities provide a range of local services. There are 309 lower tier local authorities in England made up of 181 non-metropolitan districts, 59 unitary authorities, 36 metropolitan districts and 33 London boroughs (including City of London). In Wales there are 22 local authorities made up of 22 unitary authorities.

Coverage

Census 2021 statistics are published for the whole of England and Wales. However, you can choose to filter areas by:

country - for example, Wales

region - for example, London

local authority - for example, Cornwall

health area – for example, Clinical Commissioning Group

statistical area - for example, MSOA or LSOA

Multiple main languages in household

Classifies households by whether members speak the same or different main language. If multiple main languages are spoken, this identifies whether they differ between generations or partnerships within the household.

Ethnic group

The ethnic group that the person completing the census feels they belong to. This could be based on their culture, family background, identity or physical appearance.

Respondents could choose one out of 19 tick-box response categories, including write-in response options.
b
Percentage main language is not English: Cannot speak English - Birmingham...
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Sep 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Percentage main language is not English: Cannot speak English - Birmingham Wards [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-cannot-speak-english-birmingham-wards/
Explore at:
excel, csv, json, geojsonAvailable download formats
Dataset updated
Sep 6, 2021
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
Birmingham
Description
This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.
F
British English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). British English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-uk
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United Kingdom
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the UK English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world UK English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic British accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of UK English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native UK English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of United Kingdom to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for UK English.

•
Voice Assistants: Build smart assistants capable of understanding natural British conversations.
Leading language learning apps by aided brand awareness in the UK in 2023
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading language learning apps by aided brand awareness in the UK in 2023 [Dataset]. https://www.statista.com/statistics/1490113/language-learning-app-awareness-uk/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United Kingdom
Description
In 2023, Duolingo led the language learning app market in the United Kingdom, achieving an aided brand awareness of 38.23 percent among consumers. Rosetta Stone followed with 29.84 percent, and Babbel reported 28.99 percent.
Most common languages spoken in India 2011
statista.com
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most common languages spoken in India 2011 [Dataset]. https://www.statista.com/statistics/616508/most-common-languages-india/
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2011
Area covered
India
Description
Hindi, with over *** million native speakers was the most spoken language across Indian homes, followed by Bengali with ** million speakers, as of 2011 census data. English native speakers accounted for about *** thousand during the measured time period. The colonial rule in India One of the most remarkable and widespread legacies that the British colonial rule left behind was the English language. Before independence, the English language was the solely used for higher education and in government and administrative processes. Post-independence, however, and till today, Hindi was claimed as the language with official government patronage. This lead to resistance from the southern states of India, where Hindi did not have prominence. Consequently, the Official Languages Act of 1963, was enacted by the parliament, which ensured the continued use of English for official purposes in conjunction with Hindi. Multi-linguistic cultures India has approximately ** major languages that are written in about ** different scripts. While the country’s official languages are both, English and Hindi, Hindi remains the most preferred language used online especially in the northern rural areas. The use of English is becoming increasingly popular in the urban areas. In addition, almost every state in India has its own official language that is studied in primary and secondary school as an obligatory second language. Among the most prominent are Bengali, Marathi, and Telugu.
b
Percentage main language is not English: Can speak English well - WMCA
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Sep 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Percentage main language is not English: Can speak English well - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-can-speak-english-well-wmca/
Explore at:
json, csv, excel, geojsonAvailable download formats
Dataset updated
Sep 6, 2021
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.
e
Top Kalbos Kalbos Kalbos Londone Boroughs ir MSOA
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
census@london.gov.uk, Top Kalbos Kalbos Kalbos Londone Boroughs ir MSOA [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=lt
Explore at:
unknownAvailable download formats
Dataset authored and provided by
census@london.gov.uk
Area covered
Londonas
Description
Šis duomenų rinkinys rodo labiausiai kalbas pagal miesto ir MSOA Londone. Jame pateikiami skaičiai 3+ amžiaus gyventojų, kurie kalba konkrečiai_nbsp;languages kaip jų pagrindinę kalbą. Pagrindinė kalba yra nuo 2011 m. Surašymo (išsamiai) – Surašymo lentelė QS204EW.

Šie duomenys pateikiami kartu su metiniu gyventojų tyrimu (APS) duomenimis, iš kurių matyti 2019 m. sausio-gruodžio mėn. gyventojų didžiausias tautybių skaičius pagal rajonus. Trys geriausi ne Didžiosios Britanijos piliečiai yra tolimiausioje dešinėje stalo pusėje. Taip siekiama atkreipti dėmesį į sritis, kuriose dabar, palyginti su 2011 m. (metais, kuriais buvo surinkta surašymo informacija), gali būti vartojamos ir kitos bendros ne britų kalbos. AD skiltyje paryškintos geriausios 2019 m. ne Didžiosios Britanijos pilietybės, kurios 2011 m. nebuvo viena iš dažniausiai vartojamų ne britų kalbų.

APS turi aμnbsp;sample of around 320 000 žmonių Jungtinėje Karalystėje (apie 28,000 Londone). Todėl visi skaičiai turi būti vertinami atsargiai. Apylinkių, kurios nesiekia 10 000, įverčiai laikomi per mažais, kad būtų patikimi, ir turėtų būti vertinami papildomai atsargiai.

MSOA kodai dabar susieti su Bendruomenių Rūmų MSOA pavadinimais
British English Language Datasets | 150+ Years of Research | Natural...
datarade.ai
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxford Languages (2025). British English Language Datasets | 150+ Years of Research | Natural Language Processing (NLP) Data | LLMs | TTS | Dictionary Display | EU Coverage [Dataset]. https://datarade.ai/data-products/british-english-language-datasets-150-years-of-research-oxford-languages
Explore at:
.csv, .json, .mp3, .wav, .xls, .xmlAvailable download formats
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Oxford Languageshttps://www.lexico.com/
Area covered
United Kingdom
Description
Our British English language datasets are meticulously curated and annotated by experienced linguistics and language experts, ensuring exceptional accuracy, consistency, and linguistic depth. The below datasets in British English are available for license:

British English Monolingual Dictionary Data

British English Synonyms and Antonyms Data

British English Pronunciations with Audio

Key Features (approximate numbers):

British English Monolingual Dictionary Data

Our British English monolingual dataset delivers clear, reliable definitions and authentic usage examples, featuring a high volume of headwords and in-depth coverage of the British English variant of English. As one of the world’s most authoritative lexical resources, it’s trusted by leading academic, AI, and language technology organizations.

Headwords: 146,000

Senses: 230,000

Sentence examples: 149,000

Format: XML and JSON format

Delivery: Email (link-based file sharing) and REST API

Updated frequency: twice a year

British English Synonyms and Antonyms Data

This British English language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for NLP tasks such as semantic search, word sense disambiguation, and language generation.

Synonyms: 600,000

Antonyms: 22,000

Usage Examples: 39,000

Format: XML and JSON format

Delivery: Email (link-based file sharing)

Updated frequency: annually

British English Pronunciations with audio (word-level)

This dataset provides IPA transcriptions and mapped audio files for words in contemporary British English, with a focus on UK speaker usage. It includes syllabified transcriptions, variant spellings, part-of-speech tags, and pronunciation group identifiers. Audio files are supplied separately and linked where available – ideal for TTS, ASR, and pronunciation modeling.

Transcriptions (IPA): 250,000

Audio files: 180,000

Format: XLSX (for transcriptions), MP3 and WAV (audio files)

Updated frequency: annually

Use Cases:

We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).

If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.

Pricing:

Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.
b
Percentage main language is not English: Cannot speak English - WMCA Wards...
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Sep 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Percentage main language is not English: Cannot speak English - WMCA Wards (2025) [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-main-language-is-not-english-cannot-speak-english-wmca-wards-2025/
Explore at:
json, geojson, csv, excelAvailable download formats
Dataset updated
Sep 6, 2021
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas.

Data is Powered by LG Inform Plus and automatically checked for new data on the 4th of each month.
Language of films released in the United Kingdom (UK) and Republic of...
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Language of films released in the United Kingdom (UK) and Republic of Ireland in 2019 [Dataset]. https://www.statista.com/statistics/296835/number-of-films-released-in-the-uk-and-ireland-by-language/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2019
Area covered
Ireland, United Kingdom
Description
This statistic displays the languages of films released in the United Kingdom and Republic of Ireland in 2019. Following English language movies and movies that featured English alongside extensive use of another language, Hindi language movies were second most common, followed by Spanish and Polish. In 2019, ** Hindi movies and ** Spanish movies were released.
b
Percentage main language is not English: Can speak English well - WMCA Wards...
cityobservatory.birmingham.gov.uk
csv, excel, geojson +1
Updated Sep 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Percentage main language is not English: Can speak English well - WMCA Wards (2025) [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-main-language-is-not-english-can-speak-english-well-wmca-wards-2025/
Explore at:
excel, geojson, json, csvAvailable download formats
Dataset updated
Sep 6, 2021
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This provides estimates of the percentage of usual residents aged 3 and over in England and Wales by their proficiency in English. The proficiency in English classification corresponds to the tick box response options on the census questionnaire. Estimates are used to help central government, local authorities and the NHS allocate resources and provide services for non-English speakers. It also helps public service providers effectively target the delivery of their services. For example, translation and interpretation services and material in alternative languages. Statistical Disclosure Control - In order to protect against disclosure of personal information from the Census, there has been swapping of records in the Census database between different geographic areas, and so some counts will be affected. In the main, the greatest effects will be at the lowest geographies, since the record swapping is targeted towards those households with unusual characteristics in small areas.

Data is Powered by LG Inform Plus and automatically checked for new data on the 4th of each month.
e
Top jazyky mluvené v londýnských obvodech a MSOAs
data.europa.eu
unknown
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
census@london.gov.uk (2025). Top jazyky mluvené v londýnských obvodech a MSOAs [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=cs
Explore at:
unknownAvailable download formats
Dataset updated
Jun 10, 2025
Dataset authored and provided by
census@london.gov.uk
Description
Tento datový soubor ukazuje nejrozšířenější jazyky čtvrti a MSOA v Londýně. To poskytuje počet obyvatel ve věku 3+, kteří mluví specifikované, nbsp;jazyky anbsp;jejich hlavním jazykem. Hlavní jazyk je z 2011 sčítání lidu (podrobné) – sčítání lidu tabulka QS204EW. Tyto údaje jsou prezentovány spolu s daty Annual Population Survey (APS) ukazujícími nejvyšší národnost obyvatel v období od ledna do prosince 2019 podle čtvrtí. Top 3 non-britské národnosti jsou na pravé straně stolu. Cílem je upozornit na oblasti, v nichž se ve srovnání s rokem 2011 (rok, v němž byly shromážděny informace o sčítání lidu) hovoří i jiné běžné nebritské jazyky. Nejvyšší nebritské národnosti v roce 2019, které se v roce 2011 nevyskytovaly jako jeden z nejvíce mluvených nebritských jazyků, jsou zvýrazněny ve sloupci AD. APS má ve Velké Británii přibližně 320.000 lidí (přibližně 28 000 v Londýně). Jako takové je třeba se všemi údaji zacházet s určitou opatrností. Odhady nebritských národností na úrovni obvodu, které jsou nižší než 10 000, jsou považovány za příliš malé na to, aby byly spolehlivé, a měly by být ošetřeny s větší opatrností. Kódy MSOA jsou nyní propojeny s názvy MSOA House of Commons
England and Wales Census 2021 - The international student population
statistics.ukdataservice.ac.uk
xlsx
Updated May 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2023). England and Wales Census 2021 - The international student population [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/england-and-wales-census-2021-the-international-student-population
Explore at:
xlsxAvailable download formats
Dataset updated
May 10, 2023
Dataset provided by
Northern Ireland Statistics and Research Agency
Office for National Statisticshttp://www.ons.gov.uk/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
England, Wales
Description
Census 2021 data on international student population of England and Wales by country of birth, passport held, age, sex and other characteristics.

These datasets are part of the release: The changing picture of long-term international migration, England and Wales: Census 2021. Figures may differ slightly in future releases because of the impact of removing rounding and applying further statistical processes.

Figures are based on geography boundaries as of 1 April 2022.

This release includes comparisons to the folllowing 2011 Census data:

Students in full-time education

International students (by Region)

International students by country of birth

International students by Top 100 countries of birth

Quality notes can be found here

Quality information about demography and migration can be found here

Quality information about labour market can be found here

Usual resident

A usual resident is anyone who on Census Day, 21 March 2021 was in the UK and had stayed or intended to stay in the UK for a period of 12 months or more, or had a permanent UK address and was outside the UK and intended to be outside the UK for less than 12 months.

International student

An international student is defined as someone who was a usual resident in England and Wales and meets all the following criteria:

in-full-time education

non-UK-born

non-UK passport holder

aged 17 years or over upon most recent arrival in the UK

aged 18 years or over on Census Day.

Country of birth

The country in which a person was born. The following country of birth classifications are used in this dataset:

Country of birth 12a: Political groupings of countries by EU membership and geographical location for non-EU countries.

Country of birth 190a: Individual countries. This classification includes geographical groupings for low volume countries.

Country of birth (3 categories): These categories have been derived from country of birth 12a and include all UK countries in "Europe: United Kingdom", all EU countries in "Europe: EU countries" and all remaining countries including British Overseas territories in "Non-EU countries (including British Overseas)".

More information about country of birth classifications can be found here.

Passports held

The country or countries that a person holds, or is entitled to hold, a passport for. Where a person recorded having more than one passport, they were counted only once, categorised in the following priority order: 1. UK passport, 2. Irish passport, 3. Other passport. The following classifications were created for this dataset for comparability with other international migration releases:

Passports held (4 categories): High level political groupings of passport held by EU membership and geographical location for non-EU countries.

Passports held (12 categories): Political groupings of passport held by EU membership and geographical location for non-EU countries.

Passports held (150 categories): Individual countries for passport held. This classification includes geographical groupings for low volume countries.

More information can be found here

Economic activity status

The economic activity status of a person on Census Day, 21 March 2021. The following classification is used in this dataset:

Economic activity status classification status_12a

Industry

The industry worked in for those in current employment. The following classification was used for this dataset:

Industry (current) classification 22a

Student accommodation

Student accommodation breaks down household type by typical households used by students. This includes communal establishments, all student households, households containing a single family, households containing multiple families, living with parents and living alone.

More information can be found here

Second address indicator

The second address indicator is used to define an address (in or out of the UK) a person stays at for more than 30 days per year that is not their place of usual residence. Second addresses typically include: armed forces bases, addresses used by people working away from home, a student’s home address, the address of another parent or guardian, a partner’s address, a holiday home. There are 3 categories in this classification.

Detailed description can be found here

Main language (detailed)

This is used to define a person's first or preferred language. This breaks down the responses given in the write-in option "Other, write in (including British Sign Language)". There are 95 categories in the primary classification.

More details can be found here

Proficiency in English language

Proficiency in English language is used to determine how well a person whose main language is not English (English or Welsh in Wales) feels they can speak English. There are a total number of 6 categories in this classification.

More details can be found here
Local areas with a non-English language as main language England and Wales...
statista.com
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Local areas with a non-English language as main language England and Wales 2021 [Dataset]. https://www.statista.com/statistics/329633/england-and-wales-local-areas-with-non-english-as-a-main-language/
Explore at:
Dataset updated
Apr 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
England, Wales, United Kingdom
Description
In 2021, the London borough of Newham had the highest share of residents that spoke a language other than English as their main language. Brent had the second-highest share of residents that had a different main language, followed by Ealing and Harrow, all also London boroughs. Outside of London, Leicester had the highest share of people who reported a language other than English as their main one, at 30 percent.
e
Ukrainian web corpus MaCoCu-uk 1.0 - Dataset - B2FIND
b2find.eudat.eu
Updated Feb 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Ukrainian web corpus MaCoCu-uk 1.0 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/d7e90175-ec31-52de-94aa-b62ecac58611
Explore at:
Dataset updated
Feb 13, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Ukraine, United Kingdom
Description
The Ukrainian web corpus MaCoCu-uk 1.0 was built by crawling the ".ua" and ".укр" internet top-level domains in 2022, extending the crawl dynamically to other domains as well. The crawler is available at https://github.com/macocu/MaCoCu-crawler. Considerable effort was devoted into cleaning the extracted text to provide a high-quality web corpus. This was achieved by removing boilerplate (https://corpus.tools/wiki/Justext) and near-duplicated paragraphs (https://corpus.tools/wiki/Onion), discarding very short texts as well as texts that are not in the target language. The dataset is characterized by extensive metadata which allows filtering the dataset based on text quality and other criteria (https://github.com/bitextor/monotextor), making the corpus highly useful for corpus linguistics studies, as well as for training language models and other language technologies. In XML format, each document is accompanied by the following metadata: title, crawl date, url, domain, file type of the original document, distribution of languages inside the document, and a fluency score based on a language model. The text of each document is divided into paragraphs that are accompanied by metadata on the information whether a paragraph is a heading or not, metadata on the paragraph quality (labels, such as “short” or “good”, assigned based on paragraph length, URL and stopword density via the jusText tool - https://corpus.tools/wiki/Justext) and fluency (score between 0 and 1, assigned with the Monocleaner tool - https://github.com/bitextor/monocleaner), the automatically identified language of the text in the paragraph, and information whether the paragraph contains sensitive information (identified via the Biroamer tool - https://github.com/bitextor/biroamer). The corpus can be easily read with the prevert parser (https://pypi.org/project/prevert/). Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains. A newer version of the corpus is available as part of the MaCoCu-Genre corpora collection at http://hdl.handle.net/11356/1969. The main novelty of the MaCoCu-Genre version is that the texts have been automatically annotated with genre categories. Additionally, the corpus underwent additional post-processing and has been transformed to the JSONL format.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Most common non-English languages spoken in England and Wales 2021 [Dataset]. https://www.statista.com/statistics/284010/most-common-non-english-languages-spoken-in-england-and-wales/

Most common non-English languages spoken in England and Wales 2021

Explore at:

Dataset updated

Jun 13, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2021

Area covered

United Kingdom

Description

In 2021, there were 611,845 people who spoke Polish as a main language in England and Wales, the most common non-English language among the population. This was followed by Romanian, and Panjabi, which had 471,945 speakers and 290,745 speakers respectively.

Clear search

Close search

Google apps

Main menu

Most common non-English languages spoken in England and Wales 2021

Top Languages Spoken in London Boroughs and MSOAs

The most spoken languages worldwide 2025

Common languages used for web content 2025, by share of websites

Census 2021 - Main language

England and Wales Census 2021 - RM080: Multi-language households by ethnic...

Percentage main language is not English: Cannot speak English - Birmingham...

British English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Leading language learning apps by aided brand awareness in the UK in 2023

Most common languages spoken in India 2011

Percentage main language is not English: Can speak English well - WMCA

Top Kalbos Kalbos Kalbos Londone Boroughs ir MSOA

British English Language Datasets | 150+ Years of Research | Natural...

Percentage main language is not English: Cannot speak English - WMCA Wards...

Language of films released in the United Kingdom (UK) and Republic of...

Percentage main language is not English: Can speak English well - WMCA Wards...

Top jazyky mluvené v londýnských obvodech a MSOAs

England and Wales Census 2021 - The international student population

Local areas with a non-English language as main language England and Wales...

Ukrainian web corpus MaCoCu-uk 1.0 - Dataset - B2FIND

Most common non-English languages spoken in England and Wales 2021