Facebook
TwitterList of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending September 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/691afc82e39a085bda43edd8/passenger-arrivals-summary-sep-2025-tables.ods">Passenger arrivals summary tables, year ending September 2025 (ODS, 31.5 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/691b03595a253e2c40d705b9/electronic-travel-authorisation-datasets-sep-2025.xlsx">Electronic travel authorisation detailed datasets, year ending September 2025 (MS Excel Spreadsheet, 58.6 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/6924812a367485ea116a56bd/visas-summary-sep-2025-tables.ods">Entry clearance visas summary tables, year ending September 2025 (ODS, 53.3 KB)
https://assets.publishing.service.gov.uk/media/691aebbf5a253e2c40d70598/entry-clearance-visa-outcomes-datasets-sep-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending September 2025 (MS Excel Spreadsheet, 30.2 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional data relating to in country and overse
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
NBA data ranging from 1996 to 2024 contains physical attributes, bio information, (advanced) stats, and positions of players.
No missing values, certain data preprocessing will be needed depending on the task.
Data was gathered from the nba.com and Basketball Reference - starting with the season 1996/97 and up until the latest season 2023/24.
A lot of options for EDA & ML present - analyzing the change of physical attributes by position, how the number of 3-point shots changed throughout years, how the number of foreign players increased; using Machine Learning to predict player's points, rebounds and assists, predicting player's position, player clustering, etc.
The issue with the data was that the data about player height and weight was in Imperial system, so the scatterplot of heights and weights was not looking good (around only 20 distinct values for height and around 150 for weight, which is quite bad for the dataset of 13.000 players). I created a script in which I assign a random height to the player between 2 heights (let's say between 200.66 cm and 203.2 cm, which would be 6-7 and 6-8 in Imperial system), but I did it in a way that 80% of values fall in the range of 5 to 35% increase, which still keeps the integrity of the data (average height of the whole dataset increased for less than 1 cm). I did the same thing for the weight: since difference between 2 pounds is around 0.44 kg, I would assign a random value for weight for each player that is either +/- 0.22 from his original weight. Here I observed a change in the average weight of the whole dataset of around 0.09 kg, which is insignificant.
Unfortunately the NBA doesn't provide the data in cm and kg, and although this is not the perfect approach regarding accuracy, it is still much better than assigning only 20 heights to the dataset of 13.000 players.
Facebook
TwitterOfficer Involved Shooting (OIS) Database and Statistical Analysis. Data is updated after there is an officer involved shooting.PIU#Incident # - the number associated with either the incident or used as reference to store the items in our evidence rooms Date of Occurrence Month - month the incident occurred (Note the year is labeled on the tab of the spreadsheet)Date of Occurrence Day - day of the month the incident occurred (Note the year is labeled on the tab of the spreadsheet)Time of Occurrence - time the incident occurredAddress of incident - the location the incident occurredDivision - the LMPD division in which the incident actually occurredBeat - the LMPD beat in which the incident actually occurredInvestigation Type - the type of investigation (shooting or death)Case Status - status of the case (open or closed)Suspect Name - the name of the suspect involved in the incidentSuspect Race - the race of the suspect involved in the incident (W-White, B-Black)Suspect Sex - the gender of the suspect involved in the incidentSuspect Age - the age of the suspect involved in the incidentSuspect Ethnicity - the ethnicity of the suspect involved in the incident (H-Hispanic, N-Not Hispanic)Suspect Weapon - the type of weapon the suspect used in the incidentOfficer Name - the name of the officer involved in the incidentOfficer Race - the race of the officer involved in the incident (W-White, B-Black, A-Asian)Officer Sex - the gender of the officer involved in the incidentOfficer Age - the age of the officer involved in the incidentOfficer Ethnicity - the ethnicity of the suspect involved in the incident (H-Hispanic, N-Not Hispanic)Officer Years of Service - the number of years the officer has been serving at the time of the incidentLethal Y/N - whether or not the incident involved a death (Y-Yes, N-No, continued-pending)Narrative - a description of what was determined from the investigationContact:Carol Boylecarol.boyle@louisvilleky.gov
Facebook
TwitterData files containing detailed information about vehicles in the UK are also available, including make and model data.
Some tables have been withdrawn and replaced. The table index for this statistical series has been updated to provide a full map between the old and new numbering systems used in this page.
The Department for Transport is committed to continuously improving the quality and transparency of our outputs, in line with the Code of Practice for Statistics. In line with this, we have recently concluded a planned review of the processes and methodologies used in the production of Vehicle licensing statistics data. The review sought to seek out and introduce further improvements and efficiencies in the coding technologies we use to produce our data and as part of that, we have identified several historical errors across the published data tables affecting different historical periods. These errors are the result of mistakes in past production processes that we have now identified, corrected and taken steps to eliminate going forward.
Most of the revisions to our published figures are small, typically changing values by less than 1% to 3%. The key revisions are:
Licensed Vehicles (2014 Q3 to 2016 Q3)
We found that some unlicensed vehicles during this period were mistakenly counted as licensed. This caused a slight overstatement, about 0.54% on average, in the number of licensed vehicles during this period.
3.5 - 4.25 tonnes Zero Emission Vehicles (ZEVs) Classification
Since 2023, ZEVs weighing between 3.5 and 4.25 tonnes have been classified as light goods vehicles (LGVs) instead of heavy goods vehicles (HGVs). We have now applied this change to earlier data and corrected an error in table VEH0150. As a result, the number of newly registered HGVs has been reduced by:
3.1% in 2024
2.3% in 2023
1.4% in 2022
Table VEH0156 (2018 to 2023)
Table VEH0156, which reports average CO₂ emissions for newly registered vehicles, has been updated for the years 2018 to 2023. Most changes are minor (under 3%), but the e-NEDC measure saw a larger correction, up to 15.8%, due to a calculation error. Other measures (WLTP and Reported) were less notable, except for April 2020 when COVID-19 led to very few new registrations which led to greater volatility in the resultant percentages.
Neither these specific revisions, nor any of the others introduced, have had a material impact on the statistics overall, the direction of trends nor the key messages that they previously conveyed.
Specific details of each revision made has been included in the relevant data table notes to ensure transparency and clarity. Users are advised to review these notes as part of their regular use of the data to ensure their analysis accounts for these changes accordingly.
If you have questions regarding any of these changes, please contact the Vehicle statistics team.
Overview
VEH0101: https://assets.publishing.service.gov.uk/media/68ecf5acf159f887526bbd7c/veh0101.ods">Vehicles at the end of the quarter by licence status and body type: Great Britain and United Kingdom (ODS, 99.7 KB)
Detailed breakdowns
VEH0103: https://assets.publishing.service.gov.uk/media/68ecf5abf159f887526bbd7b/veh0103.ods">Licensed vehicles at the end of the year by tax class: Great Britain and United Kingdom (ODS, 23.8 KB)
VEH0105: https://assets.publishing.service.gov.uk/media/68ecf5ac2adc28a81b4acfc8/veh0105.ods">Licensed vehicles at
Facebook
TwitterThe subject matter in the five individual files which comprise the total data package is similar. SA1 presents detailed kind-of- business statistics (two-, three-, and four-digit industry levels) on number of establishments and receipts (total and with payroll), number of proprietorships and partnerships, annual and first quarter payroll, and number of paid employees. SA2 contains the same data items as above for selected services total, in addition to the number of establishments and receipt s for five major kind-of-business groups. SA3 contains number of establishments and receipts for selected services total and for 130 kind-of- business classifications. SA4 presents receipts and rank by volume of receipts. SA5 statistics are given by city size for number of incorporated cities, total population, number of establishments, receipts, yearly payroll, and the percent of total by population and sales.
Each of the files has slightly different geography for which summaries are presented. SA1 has summaries for the United States, divisions, States, SCA's and SMSA's, and counties and cities with over 300 service establishments. SA2 presents summary counts for each city of 2,500 inhabitants or more and for remainder of county. SA3 has summaries for the United States, regions, divisions, and States. SA4 presents summaries for the 250 largest counties and cities. SA5 presents United States tot al.
Data pertain to the date of the census, 1972. The first major enumeration of Selected Service establishments covered 1933. Censuses were also taken in 1939, 1948, and in 5 year intervals since
Facebook
Twitter*denotes a significant deviation from Hardy-Weinberg equilibrium (P<0.05).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains all the stats of all completed competitions organized on Kaggle .It contains 15 columns . 1.Comp_name- Name of competition
2.comp_ Reward- Type of Reward
3.comp_link- link of competiton
4.teams- number of participated team
5.Entries- Number of Entries
6.Competitors- number of competitors
7.start_date- starting date
8.start_month- starting month
9.start_year- starting year
10.Final_date- ending date
11.Final_month- Ending month
12.Final_year- ending year
13.code_link- Link of one notebook on each comp
14.Desc- Description of competition
This dataset has been scrapped from link
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Translated and Tidy dataset of official crime stats for the Country of Mexico.
Source: Official Mexican Government Website
Description: This dataset is a compilation of criminal incidents reported across Mexico. It includes detailed records of various criminal activities, offering insights into crime patterns and trends in different regions. The dataset is ideal for analysis in criminology, public policy, and data science.
year: The year when the crime was reported. This is a numeric field representing the calendar year (e.g., 2015).
entity_code: A numeric code representing a specific entity (state or region) within Mexico. Each number corresponds to a unique entity.
entity: The name of the Mexican state or region where the crime occurred. This is a textual field (e.g., Aguascalientes).
affected_legal_good: A categorical field describing the broad category of the legal good (i.e., personal or societal interest) affected by the crime. Examples include 'Personal freedom' and 'Sexual freedom and security'.
type_of_crime: A categorical field indicating the general type of crime. This field is more specific than 'affected_legal_good' but less specific than 'subtype_of_crime'. Examples include 'Abduction', 'Sexual abuse', and 'Robbery'.
subtype_of_crime: A further categorization of the type of crime. This field provides more specific details within the general type of crime. Examples include 'Sexual Harassment', 'Simple Rape', and 'Home Burglary'.
modality: The specific nature or method of the crime. This field details how the crime was committed or any specific characteristic that differentiates it within its subtype. Examples include 'With violence', 'Without violence', 'Sexual Bullying'.
month: The month when the crime was reported. This is a textual field representing the month (e.g., January).
count: The number of reported incidents for the specific crime type, subtype, and modality in the given entity and month. This is a numeric field
Type of Data: Structured data, CSV format Number of Records: Shape (332416, 9) Date Range: 2015-2023 ( up to October) Nov - Dec not release yet
Intended Use: Research in criminology, public policy analysis, crime trend analysis Example Analyses: Crime rate trends over time, regional crime analysis, type of crime frequency analysis
Collection Process: Data aggregated from official crime reports and records maintained by the Mexican government Data Authenticity: Sourced from Gobierno de Mexico
Accuracy: Official - part of the Mexican Government's push for openness Completeness: Comprehensive coverage of reported incidents within the specified period Limitations: Possible underreporting or inconsistencies in crime reporting across regions. Nov 2023- Dec 2023 not release yet
Update Frequency: Quarterly (or as new data becomes available)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was scraped from tables found at https://www.nfl.com/standings/league/2021/REG (for each year). This dataset contains each individual year's LEAGUE data by team. There is also a master file which has compiled all the data into one csv for analysis and comparison by year. The column description can be found at the bottom of this section. If you are interested in the code used to scrape the data, you can view the full project details at https://github.com/cvandergeugten/NFL-LEAGUE-DATA/blob/main/nfl_league_data_scraper.py
This dataset replicates the table found on the NFL's website exactly. There are some columns that can be cleaned up, renamed, or altered to allow use for analysis. There are also columns that can be used to create new features to be used in analysis. For those that want some practice on tidying up datasets and using them for predictive modeling or exploratory analysis, here is a list of objectives you can try to accomplish with this data:
Extract information from the 'record' columns (Home, Road, Division). These columns are not formatted to be directly used for analysis so you can create new columns that indicate each statistics individually. For example, you can create a new column called "Home Wins" and then write some code to extract the number of wins from the 'Home' column. Repeat with 'Home Losses' and 'Home Ties'. If you do this for each record column, you will have transformed all that information into useable data for modeling and analysis.
Create a feature called 'Undefeated' which will be a binary categorical variable. Input a 1 if the team never lost a game in that particular record column, and put a 0 if that team had any losses within that record. Repeat for all the different record columns (you might want to specify the record in the variable like this: 'Undefeated Home')
Create new columns for the winning and losing streak's value. You can name two columns 'Win Streak #' and 'Lose Streak #' and then write some code that will extract that information from the 'Strk' column. If a team was on a winning streak, then the value for their 'Lose Streak #' should be 0.
Create new columns that indicate which division a team is in!
Have some fun and engineer some of your own features!!
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Arabic Sign Language (ASL) 20-Words Dataset v1 was carefully designed to reflect natural conditions, aiming to capture realistic signing environments and circumstances. Recognizing that nearly everyone has access to a smartphone with a camera as of 2020, the dataset was specifically recorded using mobile phones, aligning with how people commonly record videos in daily life. This approach ensures the dataset is grounded in real-world conditions, enhancing its applicability for practical use cases.
Each video in this dataset was recorded directly on the authors' smartphones, without any form of stabilization—neither hardware nor software. As a result, the videos vary in resolution and were captured across diverse locations, places, and backgrounds. This variability introduces natural noise and conditions, supporting the development of robust deep learning models capable of generalizing across environments.
In total, the dataset comprises 8,467 videos of 20 sign language words, contributed by 72 volunteers aged between 20 and 24. Each volunteer performed each sign a minimum of five times, resulting in approximately 100 videos per participant. This repetition standardizes the data and ensures each sign is adequately represented across different performers. The dataset’s mean video count per sign is 423.35, with a standard deviation of 18.58, highlighting the balance and consistency achieved across the signs.
For reference, Table 2 (in the research article) provides the count of videos for each sign, while Figure 2 (in the research article) offers a visual summary of the statistics for each word in the dataset. Additionally, sample frames from each word are displayed in Figure 3 (in the research article), giving a glimpse of the visual content captured.
For in-depth insights into the methodology and the dataset's creation, see the research paper: Balaha, M.M., El-Kady, S., Balaha, H.M., et al. (2023). "A vision-based deep learning approach for independent-users Arabic sign language interpretation". Multimedia Tools and Applications, 82, 6807–6826. https://doi.org/10.1007/s11042-022-13423-9
Please consider citing the following if you use this dataset:
@misc{balaha_asl_2024_db,
title={ASL 20-Words Dataset v1},
url={https://www.kaggle.com/dsv/9783691},
DOI={10.34740/KAGGLE/DSV/9783691},
publisher={Kaggle},
author={Mostafa Magdy Balaha and Sara El-Kady and Hossam Magdy Balaha and Mohamed Salama and Eslam Emad and Muhammed Hassan and Mahmoud M. Saafan},
year={2024}
}
@article{balaha2023vision,
title={A vision-based deep learning approach for independent-users Arabic sign language interpretation},
author={Balaha, Mostafa Magdy and El-Kady, Sara and Balaha, Hossam Magdy and Salama, Mohamed and Emad, Eslam and Hassan, Muhammed and Saafan, Mahmoud M},
journal={Multimedia Tools and Applications},
volume={82},
number={5},
pages={6807--6826},
year={2023},
publisher={Springer}
}
This dataset is available under the CC BY-NC-SA 4.0 license, which allows for sharing and adaptation under conditions of non-commercial use, proper attribution, and distribution under the same license.
For further inquiries or information: https://hossambalaha.github.io/.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterList of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending September 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/691afc82e39a085bda43edd8/passenger-arrivals-summary-sep-2025-tables.ods">Passenger arrivals summary tables, year ending September 2025 (ODS, 31.5 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/691b03595a253e2c40d705b9/electronic-travel-authorisation-datasets-sep-2025.xlsx">Electronic travel authorisation detailed datasets, year ending September 2025 (MS Excel Spreadsheet, 58.6 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/6924812a367485ea116a56bd/visas-summary-sep-2025-tables.ods">Entry clearance visas summary tables, year ending September 2025 (ODS, 53.3 KB)
https://assets.publishing.service.gov.uk/media/691aebbf5a253e2c40d70598/entry-clearance-visa-outcomes-datasets-sep-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending September 2025 (MS Excel Spreadsheet, 30.2 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional data relating to in country and overse