This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are geography-specific; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% income threshold of Nova Scotian tax filers. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in La Plata County, CO, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for La Plata County median household income. You can refer the same here
FOCUSON**LONDON**2010:**INCOME**AND**SPENDING**AT**HOME**
Household income in London far exceeds that of any other region in the UK. At £900 per week, London’s gross weekly household income is 15 per cent higher than the next highest region. Despite this, the costs to each household are also higher in the capital. Londoners pay a greater amount of their income in tax and national insurance than the UK average as well as footing a higher bill for housing and everyday necessities. All of which leaves London households less well off than the headline figures suggest.
This chapter, authored by Richard Walker in the GLA Intelligence Unit, begins with an analysis of income at both individual and household level, before discussing the distribution and sources of income. This is followed by a look at wealth and borrowing and finally, focuses on expenditure including an insight to the cost of housing in London, compared with other regions in the UK.
See other reports from this Focus on London series.
REPORT:
To view the report online click on the image below. Income and Spending Report PDF
https://londondatastore-upload.s3.amazonaws.com/fol/fol10-income-cover-thumb1.png" alt="Alt text">
PRESENTATION:
This interactive presentation finds the answer to the question, who really is better off, an average London or UK household? This analysis takes into account available data from all types of income and expenditure. Click on the link to access.
The Prezi in plain text version
RANKINGS:
https://londondatastore-upload.s3.amazonaws.com/fol/fol10-income-tableau-chart-thumb.jpg" alt="Alt text">
This interactive chart shows some key borough level income and expenditure data. This chart helps show the relationships between five datasets. Users can rank each of the indicators in turn.
Borough rankings Tableau Chart
MAP:
These interactive borough maps help to geographically present a range of income and expenditure data within London.
Interactive Maps - Instant Atlas
DATA:
All the data contained within the Income and Spending at Home report as well as the data used to create the charts and maps can be accessed in this spreadsheet.
FACTS:
Some interesting facts from the data…
● Five boroughs with the highest median gross weekly pay per person in 2009:
-1. Kensington & Chelsea - £809
-2. City of London - £767
-3. Westminster - £675
-4. Wandsworth - £636
-5. Richmond - £623
-32. Brent - £439
-33. Newham - £422
● Five boroughs with the highest median weekly rent for a 2 bedroom property in October 2010:
-1. Kensington & Chelsea - £550
-2. Westminster - £500
-3. City of London - £450
-4. Camden - £375
-5. Islington - £360
-32. Havering - £183
-33. Bexley - £173
● Five boroughs with the highest percentage of households that own their home outright in 2009:
-1. Bexley – 38 per cent
-2. Havering – 36 per cent
-3. Richmond – 32 per cent
-4. Bromley – 31 per cent
-5. Barnet – 28 per cent
-31. Tower Hamlets – 9 per cent
-32. Southwark – 9 per cent
U.S. citizens with a professional degree had the highest median household income in 2023, at 172,100 U.S. dollars. In comparison, those with less than a 9th grade education made significantly less money, at 35,690 U.S. dollars. Household income The median household income in the United States has fluctuated since 1990, but rose to around 70,000 U.S. dollars in 2021. Maryland had the highest median household income in the United States in 2021. Maryland’s high levels of wealth is due to several reasons, and includes the state's proximity to the nation's capital. Household income and ethnicity The median income of white non-Hispanic households in the United States had been on the rise since 1990, but declining since 2019. While income has also been on the rise, the median income of Hispanic households was much lower than those of white, non-Hispanic private households. However, the median income of Black households is even lower than Hispanic households. Income inequality is a problem without an easy solution in the United States, especially since ethnicity is a contributing factor. Systemic racism contributes to the non-White population suffering from income inequality, which causes the opportunity for growth to stagnate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Austin County, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Austin County median household income. You can refer the same here
This dataset and map service provides information on the U.S. Housing and Urban Development's (HUD) low to moderate income areas. The term Low to Moderate Income, often referred to as low-mod, has a specific programmatic context within the Community Development Block Grant (CDBG) program. Over a 1, 2, or 3-year period, as selected by the grantee, not less than 70 percent of CDBG funds must be used for activities that benefit low- and moderate-income persons. HUD uses special tabulations of Census data to determine areas where at least 51% of households have incomes at or below 80% of the area median income (AMI). This dataset and map service contains the following layer.
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
The housing affordability measure illustrates the relationship between income and housing costs. A household that spends 30% or more of its collective monthly income to cover housing costs is considered to be “housing cost-burden[ed].”[1] Those spending between 30% and 49.9% of their monthly income are categorized as “moderately housing cost-burden[ed],” while those spending more than 50% are categorized as “severely housing cost-burden[ed].”[2]
How much a household spends on housing costs affects the household’s overall financial situation. More money spent on housing leaves less in the household budget for other needs, such as food, clothing, transportation, and medical care, as well as for incidental purchases and saving for the future.
The estimated housing costs as a percentage of household income are categorized by tenure: all households, those that own their housing unit, and those that rent their housing unit.
Throughout the period of analysis, the percentage of housing cost-burdened renter households in Champaign County was higher than the percentage of housing cost-burdened homeowner households in Champaign County. All three categories saw year-to-year fluctuations between 2005 and 2023, and none of the three show a consistent trend. However, all three categories were estimated to have a lower percentage of housing cost-burdened households in 2023 than in 2005.
Data on estimated housing costs as a percentage of monthly income was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.
As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.
Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.
For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Housing Tenure.
[1] Schwarz, M. and E. Watson. (2008). Who can afford to live in a home?: A look at data from the 2006 American Community Survey. U.S. Census Bureau.
[2] Ibid.
Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using data.census.gov; (17 October 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using data.census.gov; (22 September 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using data.census.gov; (30 September 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using data.census.gov; (10 June 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using data.census.gov; (10 June 2021).;U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; 16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table B25106; generated by CCRPC staff; using American FactFinder; (16 March 2016).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Saddle River, NJ, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Saddle River median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Santa Barbara, CA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Santa Barbara median household income. You can refer the same here
The Kingshill aquifer resides under St. Croix, an Island in the U.S. Virgin Islands. The Island of St. Croix is mountainous in the northwestern and eastern regions of the island and the central and southwest regions contain rolling hills and plains. The Kingshill aquifer underlies the plains of St. Croix. The aquifer is composed primarily of limestone and marl and has a maximum saturated thickness of 200 feet. The aquifer doesn't produce large quantities of water and much of the groundwater is suboptimal for human consumption, but it is the primary source of water in the Virgin Islands (HA 730-N). This product provides source data for the U.S. Virgin Islands, Island of St. Croix, Kingshill aquifer framework including: Georeferenced image: 1. i_56KNGSHL_bot.tif: Digitized figure of altitude contour lines representing the bottom of the Kingshill aquifer. This figure also includes the Kingshill aquifer extent. The original figure was from the Groundwater Atlas (HA 730-N) figure 114. Extent shapefiles: 1. p_56KNGSHL.shp: Polygon shapefile containing the areal extent of the Kingshill aquifer (HA 730-N). The original figure was from the Groundwater Atlas (HA 730-N) figure 114. Contour line shapefile: 1. c_56KNGSHL_bot.shp: Contour line dataset containing altitude values, in feet reference to National Geodetic Vertical Datum of 1929 (NGVD29), across the bottom of the Kingshill aquifer. These data were sourced from HA 730-N and were used to create the ra_56KNGSHL_bot.tif raster dataset. Altitude raster files: 1. ra_56KNGSHL_top.tif: Altitude raster dataset of the top of the Kingshill aquifer. Top of aquifer was assumed to be equal with land surface, but it should be noted that HA 730-N indicates about 25 percent of aquifer is overlain with a blanket of alluvium, alluvial fan, debris flow, and slope wash deposits as much as 80 feet thick. This raster was created using the Digital Elevation model (DEM) dataset (NED, 100-meter) and the altitude values are in meters reference to North American Vertical Datum of 1988 (NAVD88). 2. ra_56KNGSHL_bot.tif: Altitude raster dataset of the bottom of the Kingshill aquifer. This raster was interpolated from the c_56KNGSHL_bot.shp file and the altitude values are in meters reference to NAVD88. Depth raster files: 1. rd_56KNGSHL_top.tif: Depth raster dataset of the top of the Kingshill aquifer. The depth values are in meters below land surface (NED, 100-meter). All values in this raster are “0” because it was assumed the top of the aquifer was equal with land surface, but it should be noted that HA 730-N indicates about 25 percent of aquifer is overlain with a blanket of alluvium, alluvial fan, debris flow, and slope wash deposits as much as 80 feet thick. 2. rd_56KNGSHL_bot.tif: Depth raster dataset of the bottom of the Kingshill aquifer. The depth values are in meters below land surface (NED, 100-meter).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents a breakdown of households across various income brackets in Chapel Hill, NC, as reported by the U.S. Census Bureau. The Census Bureau classifies households into different categories, including total households, family households, and non-family households. Our analysis of U.S. Census Bureau American Community Survey data for Chapel Hill, NC reveals how household income distribution varies among these categories. The dataset highlights the variation in number of households with income, offering valuable insights into the distribution of Chapel Hill households based on income levels.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Chapel Hill median household income. You can refer the same here
Series Name: Countries with birth registration data that are at least 90 percent complete (1 = YES; 0 = NO)Series Code: SG_REG_BRTH90NRelease Version: 2020.Q2.G.03 This dataset is the part of the Global SDG Indicator Database compiled through the UN System in preparation for the Secretary-General's annual report on Progress towards the Sustainable Development Goals.Indicator 17.19.2: Proportion of countries that (a) have conducted at least one population and housing census in the last 10 years; and (b) have achieved 100 per cent birth registration and 80 per cent death registrationTarget 17.19: By 2030, build on existing initiatives to develop measurements of progress on sustainable development that complement gross domestic product, and support statistical capacity-building in developing countriesGoal 17: Strengthen the means of implementation and revitalize the Global Partnership for Sustainable DevelopmentFor more information on the compilation methodology of this dataset, see https://unstats.un.org/sdgs/metadata/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Hourly Earnings in the United States increased 0.30 percent in February of 2025 over the previous month. This dataset provides the latest reported value for - United States Average Hourly Earnings - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
This layer provides an estimate of flood frequency as one of seven classes:
None: No reasonable possibility of flooding; one chance out of 500 of flooding in any year or less than 1 time in 500 years.Very Rare: Flooding is very unlikely but is possible under extremely unusual weather conditions; less than 1 percent chance of flooding in any year or less than 1 time in 100 years but more than 1 time in 500 years.Rare: Flooding is unlikely but is possible under unusual weather conditions; 1 to 5 percent chance of flooding in any year or nearly 1 to 5 times in 100 years.Occasional: Flooding is expected infrequently under usual weather conditions; 5 to 50 percent chance of flooding in any year or 5 to 50 times in 100 years.Common: (Obsolete Class) Combination of Occasional and FrequentFrequent: Flooding is likely to occur often under usual weather conditions; more than a 50 percent chance of flooding in any year (i.e., 50 times in 100 years), but less than a 50 percent chance of flooding in all months in any year.Very Frequent: Flooding is likely to occur very often under usual weather conditions; more than a 50 percent chance of flooding in all months of any year.Dataset SummaryPhenomenon Mapped: Flooding frequencyUnits: ClassesCell Size: 30 metersSource Type: DiscretePixel Type: Unsigned integerData Coordinate System: WKID 5070 USA Contiguous Albers Equal Area Conic USGS version (contiguous US, Puerto Rico, US Virgin Islands), WKID 3338 WGS 1984 Albers (Alaska), WKID 4326 WGS 1984 Decimal Degrees (Guam, Republic of the Marshall Islands, Northern Mariana Islands, Republic of Palau, Federated States of Micronesia, American Samoa, and Hawaii).Mosaic Projection: Web Mercator Auxiliary SphereExtent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Mariana Islands, Republic of Palau, Republic of the Marshall Islands, Federated States of Micronesia, and American Samoa.Source: Natural Resources Conservation ServicePublication Date: November 2023ArcGIS Server URL: https://landscape11.arcgis.com/arcgis/Data from the gNATSGO database was used to create the layer for the for the contiguous United States and Alaska. The remaining areas were created with the gSSURGO database (Hawaii, Guam, Puerto Rico, the U.S. Virgin Islands, Northern Marianas Islands, Palau, Federated States of Micronesia, Republic of the Marshall Islands, and American Samoa).This layer is derived from the 30m (contiguous U.S.) and 10m rasters (all other regions) produced by the Natural Resources Conservation Service (NRCS). The value for flooding frequency is derived from the gSSURGO map unit aggregated attribute table field Flooding Frequency - Dominant Condition (flodfreqdcd).What can you do with this Layer? This layer is suitable for both visualization and analysis across the ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application.Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "flooding frequency" in the search box and browse to the layer. Select the layer then click Add to Map.In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "flooding frequency" in the search box, browse to the layer then click OK.In ArcGIS Pro you can use the built-in raster functions or create your own to create custom extracts of the data. Imagery layers provide fast, powerful inputs to geoprocessing tools, models, or Python scripts in Pro.The ArcGIS Living Atlas of the World provides an easy way to explore many other beautiful and authoritative maps on hundreds of topics like this one.
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of cancer (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to cancer (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with cancer was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with cancer was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with cancer, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have cancerB) the NUMBER of people within that MSOA who are estimated to have cancerAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have cancer, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from cancer, and where those people make up a large percentage of the population, indicating there is a real issue with cancer within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of cancer, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of cancer.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.Population data: Mid-2019 (June 30) Population Estimates for Middle Layer Super Output Areas in England and Wales. © Office for National Statistics licensed under the Open Government Licence v3.0. © Crown Copyright 2020.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021. © Crown Copyright 2020.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”
A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org
Please cite this when using the dataset.
Detailed description of the dataset:
1 Film Dataset: Festival Programs
The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.
The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.
The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.
The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.
2 Survey Dataset
The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.
The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.
The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.
The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.
3 IMDb & Scripts
The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.
The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.
The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.
The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.
The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.
The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.
The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.
The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.
The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.
The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.
The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.
The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.
The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.
The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.
The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.
The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.
The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.
The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.
The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.
4 Festival Library Dataset
The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.
The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
[Source Data]The Digital Divide Index or DDI ranges in value from 0 to 100, where 100 indicates the highest digital divide. It is composed of two scores, also ranging from 0 to 100: the infrastructure/adoption (INFA) score and the socioeconomic (SE) score.The INFA score groups five variables related to broadband infrastructure and adoption: (1) percentage of total 2020 population without access to fixed broadband of at least 100 Mbps download and 20 Mbps upload as of 2020 based on Ookla Speedtest® open dataset; (2) percent of homes without a computing device (desktops, laptops, smartphones, tablets, etc.); (3) percent of homes with no internet access (have no internet subscription, including cellular data plans or dial-up); (4) median maximum advertised download speeds; and (5) median maximum advertised upload speeds.The SE score groups five variables known to impact technology adoption: (1) percent population ages 65 and over; (2) percent population 25 and over with less than high school; (3) individual poverty rate; (4) percent of noninstitutionalized civilian population with a disability: and (5) a brand new digital inequality or internet income ratio measure (IIR). In other words, these variables indirectly measure adoption since they are potential predictors of lagging technology adoption or reinforcing existing inequalities that also affect adoption.These two scores are combined to calculate the overall DDI score. If a particular county or census tract has a higher INFA score versus a SE score, efforts should be made to improve broadband infrastructure. If on the other hand, a particular geography has a higher SE score versus an INFA score, efforts should be made to increase digital literacy and exposure to the technology’s benefits.The DDI measures primarily physical access/adoption and socioeconomic characteristics that may limit motivation, skills, and usage. Due to data limitations it was designed as a descriptive and pragmatic tool and is not intended to be comprehensive. Rather it should help initiate important discussions among community leaders and residents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The USDA Forest Service (USFS) builds multiple versions of percent tree canopy cover data, in order to serve needs of multiple user communities. These datasets encompass CONUS, Coastal Alaska, Hawaii, U.S. Virgin Islands and Puerto Rico. There are three versions of data within the 2016 TCC Product Suite, which include: The initial model outputs referred to as the Analytical data; A masked version of the initial output referred to as Cartographic data; And a modified version built for the National Land Cover Database and referred to as NLCD data, which includes a canopy cover change dataset derived from subtraction of datasets for the nominal years of 2011 and 2016.The Analytical data are the initial model outputs generated in the production workflow. These data are best suited for users who will carry out their own detailed statistical and uncertainty analyses on the dataset and place lower priority on the visual appearance of the dataset for cartographic purposes. Datasets for the nominal years of 2011 and 2016 are available. The Cartographic products mask the initial model outputs to improve the visual appearance of the datasets. These data are best suited for users who prioritize visual appearance of the data for cartographic and illustrative purposes. Datasets for the nominal years of 2011 and 2016 are available. The NLCD data are the result of further processing of the masked data. The goal was to generate three coordinated components. The components are (1) a dataset for the nominal year of 2011, (2) a dataset for the nominal year of 2016, and (3) a dataset that captures the change in canopy cover between the two nominal years of 2011 and 2016. For the NLCD data, the three components meet the criterion of 2011 TCC + change in TCC = 2016 TCC. These NLCD data are best suited for users who require a coordinated three-component data stack where each pixel's values meet the criterion of 2011 TCC + change in TCC = 2016 TCC. Datasets for the nominal years of 2011 and 2016 are available, as well as a dataset that captures the change (loss or gain) in canopy cover between those two nominal years of 2011 and 2016, in areas where change was identified.These tree canopy cover data are accessible for multiple user communities, through multiple channels and platforms, as listed below:AnalyticalUSFS Tree Canopy Cover DatasetsUSFS Enterprise Data WarehouseCartographicUSFS Tree Canopy Cover DatasetsNLCDMulti-Resolution Land Characteristics (MRLC) ConsortiumUSFS Enterprise Data WarehouseThe Coastal Alaska TCC 2016 NLCD dataset is comprised of a single layer. The pixel values range from 0 to 91 percent. The background is represented by the value 255. Data gaps (which are explained in more detail below) are represented by the value 127.The NLCD data include three components: 2011 NLCD TCC, 2016 NLCD TCC, and 2011-to-2016 change in TCC. For nearly all pixels, the values meet the criterion of 2011 TCC + change in TCC = 2016 TCC. However, there are some pixels with no TCC values because of a lack of imagery in persistently cloudy areas. These data gaps were given a value 127. In summary, if a data gap was present in the original 2011 or 2016 data, that data gap was carried through to all three components of the NLCD data. Recall that the three NLCD components (2011 NLCD TCC, 2016 NLCD TCC, and change between the two nominal years) are intended to coordinate and line up. The USFS's GTAC also makes available the original 2011 and 2016 TCC datasets (prior to development of any integrated data stack for NLCD) that are output as part of the production workflows. If a user would like the original datasets for the nominal years of 2011 and 2016 (prior to integrating into a common data stack for NLCD), they should visit https://data.fs.usda.gov/geodata/rastergateway/treecanopycover/and download the FS-Cartographic version of the 2011 and/or 2016 datasets for their cartographic applications.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The AQS Data Mart is a database containing all of the information from AQS. It has every measured value the EPA has collected via the national ambient air monitoring program. It also includes the associated aggregate values calculated by EPA (8-hour, daily, annual, etc.). The AQS Data Mart is a copy of AQS made once per week and made accessible to the public through web-based applications. The intended users of the Data Mart are air quality data analysts in the regulatory, academic, and health research communities. It is intended for those who need to download large volumes of detailed technical data stored at EPA and does not provide any interactive analytical tools. It serves as the back-end database for several Agency interactive tools that could not fully function without it: AirData, AirCompare, The Remote Sensing Information Gateway, the Map Monitoring Sites KML page, etc.
AQS must maintain constant readiness to accept data and meet high data integrity requirements, thus is limited in the number of users and queries to which it can respond. The Data Mart, as a read only copy, can allow wider access.
The most commonly requested aggregation levels of data (and key metrics in each) are:
Sample Values (2.4 billion values back as far as 1957, national consistency begins in 1980, data for 500 substances routinely collected) The sample value converted to standard units of measure (generally 1-hour averages as reported to EPA, sometimes 24-hour averages) Local Standard Time (LST) and GMT timestamps Measurement method Measurement uncertainty, where known Any exceptional events affecting the data NAAQS Averages NAAQS average values (8-hour averages for ozone and CO, 24-hour averages for PM2.5) Daily Summary Values (each monitor has the following calculated each day) Observation count Observation per cent (of expected observations) Arithmetic mean of observations Max observation and time of max AQI (air quality index) where applicable Number of observations > Standard where applicable Annual Summary Values (each monitor has the following calculated each year) Observation count and per cent Valid days Required observation count Null observation count Exceptional values count Arithmetic Mean and Standard Deviation 1st - 4th maximum (highest) observations Percentiles (99, 98, 95, 90, 75, 50) Number of observations > Standard Site and Monitor Information FIPS State Code (the first 5 items on this list make up the AQS Monitor Identifier) FIPS County Code Site Number (unique within the county) Parameter Code (what is measured) POC (Parameter Occurrence Code) to distinguish from different samplers at the same site Latitude Longitude Measurement method information Owner / operator / data-submitter information Monitoring Network to which the monitor belongs Exemptions from regulatory requirements Operational dates City and CBSA where the monitor is located Quality Assurance Information Various data fields related to the 19 different QA assessments possible
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.epa_historical_air_quality.[TABLENAME]
. Fork this kernel to get started.
Data provided by the US Environmental Protection Agency Air Quality System Data Mart.
This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are geography-specific; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% income threshold of Nova Scotian tax filers. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.