List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)
https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional data relating to in country and overseas Visa applications can be fo
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES CITIZENSHIP STATUS - DP02 Universe - Population in households Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 Respondents were asked to select one of five categories: (1) born in the United States; (2) born in Puerto Rico, Guam, the U.S. Virgin Islands, or Northern Marianas; (3) born abroad of U.S. citizen parent or parents; (4) U.S. citizen by naturalization; or (5) not a U.S. citizen. Respondents indicating they were a U.S. citizen by naturalization were asked to print their year of naturalization.
The layer was derived and compiled from the U.S. Census Bureau’s 2013 – 2017 American Community Survey (ACS) 5-Year Estimates in order to assist 2020 Census planning purposes.
Source: U.S. Census Bureau, Table B05002 PLACE OF BIRTH BY NATIVITY AND CITIZENSHIP STATUS, 2013 – 2017 ACS 5-Year Estimates
Effective Date: December 2018
Last Update: December 2019
Update Cycle: ACS 5-Year Estimates update annually each December. Vintage used for 2020 Census planning purposes by Broward County.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset contains five Excel files with monthly and quarterly longitudinal data on the naturalization of foreigners and passport issuance collected from the Russian Ministry of Internal Affairs (MVD), with a specific focus on Ukrainians.
The goal of the data collection is to contribute to our understanding of the extent of Russia's passportization and forced naturalization of Ukrainians in the wake of the occupation and annexation of Ukrainian territory since 2014.
This first Excel file contains the absolute number of naturalizations in Russia (in total and of Ukrainians) in the post-Soviet period from 1992 to 2022 and six months in 2023. In addition to the MVD data covering the period 2016 to 2023, the first file has been enriched with other secondary data sources taken from an academic paper from 2014 by Olga Chudinovskikh on amendments to Russia's citizenship laws and the effect on the naturalization of foreigners, and a media report from DW on the easing of naturalization rules for Ukrainians adopted in 2017. The other four files cover the period from 2016 to 2023 with data solely extracted from the MVD files. The second Excel file containts the total number of naturalizations per month in Russia between May 2016 and June 2023. The third Excel file containts the share of naturalized Ukrainians of the total number of naturalized person in Russia between the third quarter in 2016 and the second quarter in 2023 (structured as quarterly aggregated data). The fourth Excel file contains the absolute number of naturalizations in selected Russian regions, and annexed Crimea and Sevastopol, between quarter 4 in 2016 and the second quarter in 2023 (structured as quarterly aggregated data). The fifth Excel file contains the absolute number of issued passports in Russia between May 2016 and June 2023 (both domestic and foreign travel IDs, monthly data). Apart from the country of origin, the publicly available MVD statistics do not provide any further information about naturalized individuals or about those receiving a passport. The data compilation contained in the Excel files primarily focuses on Ukrainians. The entire dataset is described in detail in a separate Word file. Since only a small part of the data contained in the spreadsheets downloaded from the MVD website was extracted for this collection, the original files are also included in a zip file.
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
The project aims at providing the data required to study the descriptive representation of citizens of immigrant origin (CIOs). The main aim is to provide an overview of the social and political profile of Member of Parliament (MPs), with a particular focus on identifying MPs of immigrant origin. In addition to the national level dataset described below, a corresponding regional level dataset is available.
Identification variables: Political level (regional, national); country-ID (NUTS); name of region; region-id (NUTS); date of relevant election; full name of district in which elected; level of electoral tier (first / Lower (or single tier); identifier for tier 1 to 3 districts at national level; number of legislatures in the country, as recorded by the parliament itself; date in which the legislature begins and ends; first name, first (second) surname of MP; MP-ID; national MP is also simultaneously a regional MP; which regional MP.
Demography: sex of MP; year of birth of MP; highest level of education (ISCED 1997); last occupation /profession of the MP before first ever becoming an MP (ISCO 2008); occupation sector when first elected; current occupation/ profession of the MP (ISCO 2008); current occupation sector.
Electoral and parliamentary tenure variables: number of times the MP has been previously elected to parliament in this district; type of electoral district; number of times the MP has been previously elected to parliament in this tier; Rookie: MP elected for the first time in this term; number of times the MP has been elected to parliament; number of times the MP has taken up the seat in parliament once elected; year when the MP was first elected to national/regional parliament; total number of years spent in national/regional parliament as MP, prior to this legislature (seniority); when was the MP elected for the last time prior to this legislature (continuity); MP was elected to chamber from inauguration; MP stayed continuously with no interruptions from the moment of taking up the seat until the end of the legislative term; number of months the MP did serve (if he did not serve a full legislative term); MP came back to reclaim the seat if MP left seat at some point; position in party list; rank position in which the MP was elected in district; double candidacy in another tier; MP won seat as incumbent, or as contender; parliamentary group the MP joined at the beginning and at the end of his/her term; full name and acronym of party or list in which elected; party code according to the CMP (Comparative Manifesto Project) dataset; party-ID.
Immigrant origin variables (corresponding coding for MPs mother and father): MP was born in the country of parliament; country (ISO 3166-1), world region (UN Classification for ‘Composition of macro geographical regions’), and country region (NUTS) in which the MP was born; data sources for country of birth (e.g. official parliamentary source, personal blogs, etc.); specific sources for country of birth; reliability of the data regarding the country of birth of the MP (as judged by the coder); year of immigration; born as a national citizen of the country of parliament; country of nationality at birth; data sources country of nationality at birth; specific sources for country of citizenship at birth; reliability of the data regarding citizenship at birth; year in which naturalized as a citizen; data sources year of naturalization; specific sources for date of naturalization; reliability of the data regarding naturalization.
Variables relating to aspects potentially related to discrimination: the MP is a native speaker of an official country language and data sources; specific sources for native language of MP; MP can be perceived by voters as a member of an ‘identifiable’ minority; source where picture found; specific sources for picture of MP; does the MP self-identify as a member of an ethnic minority; ethnicity; sources and specific sources for information on ethnic self-identification of MP; self-identification as a member of a certain religion; religion the MP identifies with.
Party career and committee membership variables: year in which the MP joined the party for which she/he was elected in this legislative term; highest position within the party; MP changed party affiliation during the legislative term; date of change; full name and party acronym of the new party joined, CMP code of the new party and Pathways identifier for party; (corresponding co...
By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico Census tracts). The selection is not comprehensive, but allows a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. While the ACS contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by Census tract boundaries in New Mexico. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/
Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:
Over 8 million 311 service requests from 2012-2016
More than 1 million motor vehicle collisions 2012-present
Citi Bike stations and 30 million Citi Bike trips 2013-present
Over 1 billion Yellow and Green Taxi rides from 2009-present
Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015
This dataset is deprecated and not being updated.
Fork this kernel to get started with this dataset.
https://opendata.cityofnewyork.us/
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.
The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.
Banner Photo by @bicadmedia from Unplash.
On which New York City streets are you most likely to find a loud party?
Can you find the Virginia Pines in New York City?
Where was the only collision caused by an animal that injured a cyclist?
What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here">
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
VITAL SIGNS INDICATOR Migration (EQ4)
FULL MEASURE NAME Migration flows
LAST UPDATED December 2018
DESCRIPTION Migration refers to the movement of people from one location to another, typically crossing a county or regional boundary. Migration captures both voluntary relocation – for example, moving to another region for a better job or lower home prices – and involuntary relocation as a result of displacement. The dataset includes metropolitan area, regional, and county tables.
DATA SOURCE American Community Survey County-to-County Migration Flows 2012-2015 5-year rolling average http://www.census.gov/topics/population/migration/data/tables.All.html
CONTACT INFORMATION vitalsigns.info@bayareametro.gov
METHODOLOGY NOTES (across all datasets for this indicator) Data for migration comes from the American Community Survey; county-to-county flow datasets experience a longer lag time than other standard datasets available in FactFinder. 5-year rolling average data was used for migration for all geographies, as the Census Bureau does not release 1-year annual data. Data is not available at any geography below the county level; note that flows that are relatively small on the county level are often within the margin of error. The metropolitan area comparison was performed for the nine-county San Francisco Bay Area, in addition to the primary MSAs for the nine other major metropolitan areas, by aggregating county data based on current metropolitan area boundaries. Data prior to 2011 is not available on Vital Signs due to inconsistent Census formats and a lack of net migration statistics for prior years. Only counties with a non-negligible flow are shown in the data; all other pairs can be assumed to have zero migration.
Given that the vast majority of migration out of the region was to other counties in California, California counties were bundled into the following regions for simplicity: Bay Area: Alameda, Contra Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Solano, Sonoma Central Coast: Monterey, San Benito, San Luis Obispo, Santa Barbara, Santa Cruz Central Valley: Fresno, Kern, Kings, Madera, Merced, Tulare Los Angeles + Inland Empire: Imperial, Los Angeles, Orange, Riverside, San Bernardino, Ventura Sacramento: El Dorado, Placer, Sacramento, Sutter, Yolo, Yuba San Diego: San Diego San Joaquin Valley: San Joaquin, Stanislaus Rural: all other counties (23)
One key limitation of the American Community Survey migration data is that it is not able to track emigration (movement of current U.S. residents to other countries). This is despite the fact that it is able to quantify immigration (movement of foreign residents to the U.S.), generally by continent of origin. Thus the Vital Signs analysis focuses primarily on net domestic migration, while still specifically citing in-migration flows from countries abroad based on data availability.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Understanding what drives non-native species naturalization (the establishment of a self-sustainable population outside its native range) is a central question in invasion science. Plants’ capacity for long distance dispersal (LDD) is likely to influence the spread and naturalization of non-native species differently according to their introduction pathways. These pathways include intentional introductions (for economic use, e.g. for agriculture), unintentional introductions (e.g. seed contaminants), plant dispersal via human infrastructures (e.g. roads), and plant spread from an adjacent region where the species was previously introduced. Herein, we tested the relationship between sets of LDD traits (syndromes) of 10,308 European plant species and their global naturalization incidence (i.e. whether a species has become naturalized or not) and extent (i.e. the number of regions where a species has become naturalized) using the most comprehensive database of naturalized plants worldwide (GloNAF). Diaspore traits allowed the identification of four traditional LDD syndromes, namely those with specializations for dispersal by: wind (anemochorous), animal ingestion (endozoochorous), attached to animals (epizoochorous), and sea currents (thalassochorous). These evolutionary specializations have been historically interpreted by biologists even though actual dispersal is not always related to diaspore syndromes. We found that while epizoochorous and thalassochorous traits are positively associated with global plant naturalization incidence, anemochorous and endozoochorous traits show a negative relationship. Species´ residence time outside their native range, their economic use and presence of epizoochorous traits (such as hooks, hairs and adhesive substances) are positively associated with global naturalization extent. Furthermore, we found that plants’ economic use reduces the influence of LDD syndromes on the naturalization incidence of intentionally introduced plants. While the success of non-native plants is influenced by a broad array of species- and context-specific factors, LDD syndromes play an important role in this context depending on the economic use of plants. Methods We used information on LDD syndromes from a comprehensive database of European Spermatophytes (Heleno & Vargas, 2015), which includes 10,308 species from 137 families native to Europe. Here, each species was assigned LDD syndromes based on diaspore (typically seeds and fruits) morphology, including five classes (Table 1). Diaspores with wings or pappus (plumose hairs), which facilitate dispersal by wind, were considered anemochorous. Diaspores with fleshy and nutritive tissues, which favor animal ingestion, were considered endozoochorous. Diaspores with hooks or sticky substances, which promote the external adhesion to animals, were considered epizoochorous. Diaspores with corky tissues or air chambers, which favor floatability and protection in saltwater, were considered thalassochorous. Finally, diaspores with no specialized dispersal structures for LDD were considered unspecialized (Heleno & Vargas, 2015). Since some plants show heterocarpy (production of different kinds of diaspores) and others show diaspore traits that facilitate LDD through more than one dispersal vector, we included both species with one and species with multiple LDD syndromes, following Heleno and Vargas (2015). In this way, we acknowledge the fact that some plants might take advantage of more than one LDD strategy. However, for comparison purposes we built a second LDD syndrome database where we considered only plants with a single LDD syndrome, discarding all plants with multiple LDD syndromes. Dispersal syndromes that favor short distance (local) dispersal, such as myrmecochory and autochory, were assigned to the unspecialized category because they are not relevant for LDD (Heleno & Vargas, 2015). We obtained distributional data of naturalized vascular plants from the GloNAF database v. 1.2, which includes information of 13,939 taxa and 1,029 regions, based on 210 data sources (van Kleunen et al., 2019). A region is defined here as the smallest geographic area for which a list of non-native plants is available (mostly countries, or distinct sub-national regions such as federal states or islands), including 648 mainland regions and 381 island regions. Species names in the GloNAF database have been standardized according to The Plant List (www.theplantlist.org). For merging the database on LDD syndromes with the GloNAF database, we also standardized species names to The Plant List by using the “TPL” function from the “taxonstand” package (Cayuela et al., 2012). We used two complementary proxies for naturalization success: naturalization incidence and naturalization extent (Razanajatovo et al., 2016). Naturalization incidence is a binary response variable (yes or no) that considers if a species has been recorded as naturalized outside its native range. For a given LDD syndrome the naturalization incidence is an indicator of the likelihood that a species with a particular LDD syndrome has naturalized outside its native range. From the 10,308 species in the LDD syndrome database, 2416 (23.44 %) were recorded as naturalized in at least one region according to the GloNAF database. For these 2416 species we estimated the naturalization extent: the number of GloNAF regions where each species is reported to be naturalized. This metric is an indicator of the capability to spread across large regions for a species with a particular LDD syndrome. A recent study (van Kleunen et al., 2020) has shown that economic use of plants increases their global naturalization success, likely because economic use of plants increases propagule pressure, particularly for intentional introductions (e.g. for horticulture). We propose that species with economic use and species without economic use are mostly under the influence of different introduction pathways (as defined by Hulme et al. (2008) and CBD (2014)) (Figure 1). Species with economic use are mostly introduced intentionally, and then either intentionally released in nature (e.g. for erosion control or landscaping) or escape cultivation (mainly through seed dispersal) (Hulme et al., 2008; Harrower et al., 2018). Species without any economic use can be introduced accidentally by human vectors such as a contaminant of a commodity (e.g. as seed contaminant) or attached to (or within) a transport vector (e.g. in ships ballast water) (Hulme et al., 2008; Harrower et al., 2018). Alternatively, these species without economic use can disperse using human infrastructure that connects previously unconnected regions, or can colonize a region through unassisted dispersal from adjacent regions, where they were previously introduced (Hulme et al., 2008; Harrower et al., 2018). To test for the effect of the economic use of plants on naturalization incidence and extent, we used the dataset by van Kleunen et al. (2020) and assigned the 10,308 species from our LDD syndrome database, with which we estimated naturalization incidence and extent using GloNAF, into two groups – i.e. plants with and without economic use. We acknowledge that this is not a perfect proxy for introduction effort: some variation of introduction effort will not be explained by the economic use of plants. Further, the economic use of plants does not account for their accidental introductions by humans. In this regard, variations in propagule pressure (not explained by economic use) among species (Pyšek et al., 2015), for which we don’t have further data, may also influence naturalization patterns. To account for species’ residence time outside their native range, a key driver that favors plant naturalization (Pyšek et al., 2015; Fristoe et al. 2021), we used data on the date of first record for each species. We expected that plant species with earlier first record date would show higher naturalization extent because they had more residence time outside their native range. To estimate species´ minimum residence time outside their native range we used a global database of first record dates for non-native species compiled by Seebens et al. (2017) and updated by Seebens et al. (2018). This database includes first record dates for 11,450 non-native vascular plants around the world. To test for the effect of the first record date on naturalization extent, we used the dataset by Seebens et al. (2017, 2018) and assigned the 2416 species from our LDD syndrome database, with which we estimated naturalization extent using GloNAF, their earliest first record date outside their native range. In other words, for each naturalized species we used its earliest record date anywhere in the world to account for its residence time outside its native range. We acknowledge that this approach has its limitations. First, data on first record are only available for a subset of regions around the world. Second, there may be a time lag between the introduction of a new species and its first record. Third, data on first record is only available for a subset of naturalized species around the world. Out of the 2416 naturalized species in our LDD syndrome database we obtained data on their first record for 1986 species (83.23 %), so we restricted our dataset for analyses of naturalization extent to these 1986 species. References Cayuela, L., Granzow-de la Cerda, Í., Albuquerque, F.S. & Golicher, D.J. (2012) taxonstand: An r package for species names standardisation in vegetation databases. Methods in Ecology and Evolution, 3, 1078-1083. CBD, C.o.B.D. (2014) Pathways of introduction of invasive species, their prioritization and management. Eighteenth Meeting of the Subsidiary Body on Scientific, Technical and Technological Advice (SBSTTA). (ed by N.B.T.E. Secretary.). Montreal, 23–28 June 2014. Retrieved from
The data is based on Economic Research Service (ERN) of USDA's dataset that shows where the creative people are in the U.S. Its an interpretation of Richard Florida's thesis that much of urban development is determined by people who work in the so called ideas and knowledge industry. The workers who are in ideas and knowledge industry are attracted to areas that offer jobs in these industries and also because of desirable traits such as quality of life indicators. For details see http://www.ers.usda.gov/data/creativeclasscodes/ and http://www.ers.usda.gov/Data/CreativeClassCodes/methods.htm
The naturalization of an introduced species is a key stage during the invasion process. Therefore, identifying the traits that favor the naturalization of non-native species can help understand why some species are more successful when introduced to new regions. The ability and the requirement of a plant species to form a mutualism with mycorrhizal fungi, together with the types of associations formed may play a central role in the naturalization success of different plant species. To test the relationship between plant naturalization success and their mycorrhizal associations we analysed a database composed of mycorrhizal status and type for 1981 species, covering 155 families and 822 genera of plants from Europe and Asia, and matched it with the most comprehensive database of naturalized alien species across the world (GloNAF). In mainland regions, we found that the number of naturalized regions was highest for facultative mycorrhizal, followed by obligate mycorrhizal and lowest for non-mycorrhizal plants, suggesting that the ability of forming mycorrhizas is an advantage for introduced plants. We considered the following mycorrhizal types: arbuscular, ectomycorrhizal, ericoid and orchid mycorrhizal plants. Further, dual mycorrhizal species were those that included observations of arbuscular mycorrhizas as well as observations of ectomycorrhizas. Naturalization success (based on the number of naturalized regions) was highest for arbuscular mycorrhizal and dual mycorrhizal plants, which may be related to the low host specificity of arbuscular mycorrhizal fungi and the consequent high availability of arbuscular mycorrhizal fungal partners. However, these patterns of naturalization success were erased in islands, suggesting that the ability to form mycorrhizas may not be an advantage for establishing self-sustaining populations in isolated regions. Taken together our results show that mycorrhizal status and type play a central role in the naturalization process of introduced plants in many regions, but that their effect is modulated by other factors.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
The Low- to Moderate-Income (LMI) New York State (NYS) Census Population Analysis dataset is resultant from the LMI market database designed by APPRISE as part of the NYSERDA LMI Market Characterization Study (https://www.nyserda.ny.gov/lmi-tool). All data are derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-year Public Use Microdata Sample (PUMS) files for 2013, 2014, and 2015.
Each row in the LMI dataset is an individual record for a household that responded to the survey and each column is a variable of interest for analyzing the low- to moderate-income population.
The LMI dataset includes: county/county group, households with elderly, households with children, economic development region, income groups, percent of poverty level, low- to moderate-income groups, household type, non-elderly disabled indicator, race/ethnicity, linguistic isolation, housing unit type, owner-renter status, main heating fuel type, home energy payment method, housing vintage, LMI study region, LMI population segment, mortgage indicator, time in home, head of household education level, head of household age, and household weight.
The LMI NYS Census Population Analysis dataset is intended for users who want to explore the underlying data that supports the LMI Analysis Tool. The majority of those interested in LMI statistics and generating custom charts should use the interactive LMI Analysis Tool at https://www.nyserda.ny.gov/lmi-tool. This underlying LMI dataset is intended for users with experience working with survey data files and producing weighted survey estimates using statistical software packages (such as SAS, SPSS, or Stata).
This is a dataset hosted by the State of New York. The state has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York State using Kaggle and all of the data sources available through the State of New York organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by rawpixel on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.
It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.
Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.
This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.
A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.
All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.
This dataset shows airports in the United States, Puerto Rico and the U.S. Virgin Islands. The data were derived from an extract of The Public- Use Airports database of the National Transportation Atlas Databases-2001 (NTAD-2001), published by the Bureau of Transportation Statistics, Department of Transportation. This dataset was released in October 2001 and was found on-line at the National Atlas, www.nationalatlas.gov in Shape file format. This point data is intended for use within the United States, including Puerto Rico and the U.S. Virgin Islands. This data may be used for geographic display and analysis at the national level, and for large regional areas. Metadata: http://www.nationalatlas.gov/metadata/airprtx020.faq.html Online: www.nationalatlas.gov
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The Global Internal Displacement Database (GIDD), maintained by the Internal Displacement Monitoring Centre (IDMC), provides comprehensive, validated annual estimates of internal displacement worldwide. It defines internally displaced persons (IDPs) in line with the 1998 Guiding Principles, as people or groups of people who have been forced or obliged to flee or to leave their homes or places of habitual residence, in particular as a result of armed conflict, or to avoid the effects of armed conflict, situations of generalized violence, violations of human rights, or natural or human-made disasters and who have not crossed an international border.
The GIDD tracks two primary metrics: "People Displaced" or population "Stock" figures, which represent the total number of people living in displacement at year-end, and "New Displacement," which counts new displacement incidents (population Flows) rather than individual people, accounting for potential multiple displacements by the same person. This dataset serves as a crucial resource for understanding long-term trends and validated displacement figures globally. For further detailed information and complete API specifications, users are encouraged to consult the official documentation at https://www.internal-displacement.org/database/api-documentation/.
"Internally displaced persons - IDPs" refers to the number of people living in displacement as of the end of each year.
"Internal displacements (New Displacements)" refers to the number of new cases or incidents of displacement recorded, rather than the number of people displaced. This is done because people may have been displaced more than once.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive information about various Data Science and Analytics master's programs offered in the United States. It includes details such as the program name, university name, annual tuition fees, program duration, location of the university, and additional information about the programs.
Column Descriptions:
Subject Name:
The name or field of study of the master's program, such as Data Science, Data Analytics, or Applied Biostatistics.
University Name:
The name of the university offering the master's program.
Per Year Fees:
The tuition fees for the program, usually given in euros per year. For some programs, the fees may be listed as "full" or "full-time," indicating a lump sum for the entire program or for full-time enrollment, respectively.
About Program:
A brief description or overview of the master's program, providing insights into its curriculum, focus areas, and any unique features.
Program Duration:
The duration of the master's program, typically expressed in years or months.
University Location:
The location of the university where the program is offered, including the city and state.
Program Name:
The official name of the master's program, often indicating its degree type (e.g., M.Sc. for Master of Science) and format (e.g., full-time, part-time, online).
This dataset provides highly detailed (Block Level) views of various demographics for Manhattan, New York city. this dataset includes information on age, race, sex, income, housing, and various other attributes. This data comes from the 2000 Us Census and was joined to the Census Tiger line files to create the output. enjoy!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This folder contains data behind the story Higher Rates Of Hate Crimes Are Tied To Income Inequality.
Header | Definition |
---|---|
state | State name |
median_household_income | Median household income, 2016 |
share_unemployed_seasonal | Share of the population that is unemployed (seasonally adjusted), Sept. 2016 |
share_population_in_metro_areas | Share of the population that lives in metropolitan areas, 2015 |
share_population_with_high_school_degree | Share of adults 25 and older with a high-school degree, 2009 |
share_non_citizen | Share of the population that are not U.S. citizens, 2015 |
share_white_poverty | Share of white residents who are living in poverty, 2015 |
gini_index | Gini Index, 2015 |
share_non_white | Share of the population that is not white, 2015 |
share_voters_voted_trump | Share of 2016 U.S. presidential voters who voted for Donald Trump |
hate_crimes_per_100k_splc | Hate crimes per 100,000 population, Southern Poverty Law Center, Nov. 9-18, 2016 |
avg_hatecrimes_per_100k_fbi | Average annual hate crimes per 100,000 population, FBI, 2010-2015 |
Sources: Kaiser Family Foundation Kaiser Family Foundation Kaiser Family Foundation Census Bureau Kaiser Family Foundation Kaiser Family Foundation Census Bureau Kaiser Family Foundation United States Elections Project Southern Poverty Law Center FBI
Please see the following commit: https://github.com/fivethirtyeight/data/commit/fbc884a5c8d45a0636e1d6b000021632a0861986
This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!
This dataset is maintained using GitHub's API and Kaggle's API.
This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
This dataset explores Residence and migration of all freshmen students in degree-granting institutions, by state: Fall 2004 NOTE: Includes all first-time postsecondary students enrolled at reporting institutions. Degree-granting institutions grant associate's or higher degrees and participate in Title IV federal financial aid programs. SOURCE: U.S. Department of Education, National Center for Education Statistics, Integrated Postsecondary Education Data System (IPEDS), Spring 2005. (This table was prepared September 2005.) http://nces.ed.gov/programs/digest/d06/tables/dt06_207.asp Accessed on 12 November 2007
List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)
https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional data relating to in country and overseas Visa applications can be fo