Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Income is one of the clearest indicators of a person’s position in society. It reflects financial stability and determines the quality of life that someone can achieve. Higher income often means better access to education, healthcare, safe housing, and opportunities for growth. Lower income, in contrast, is closely linked with social vulnerability, reduced chances of mobility, and cycles of poverty that can continue across generations. For this reason, income is not just a private measure but a key element that governments, economists, and social researchers study to understand inequality.
The ability to predict income class is crucial because it highlights the factors that drive differences in earnings. With accurate prediction, governments can identify which groups are most at risk of being left behind and take action before inequality grows wider. For instance, if the data shows that certain occupations or education levels are strongly tied to low earnings, policies can be designed to expand training programs or improve access to education. Prediction therefore makes it possible to respond proactively to social and economic challenges.
The urgency of income prediction is also seen in the education sector. Knowing how income relates to qualifications allows schools, universities, and training centers to design curricula that prepare students for careers with stronger earning potential. This is one way to reduce the mismatch between education and the labor market, while also giving individuals a better chance to move up economically. In healthcare and housing, income is a strong predictor of access and demand. By predicting income levels, governments and organizations can anticipate needs for insurance support, medical subsidies, or public housing and allocate resources more effectively.
Businesses depend heavily on income insights as well. Since income shapes purchasing power, predicting income groups helps companies design products, set fair pricing, and segment their markets. Accurate income prediction also plays a role in economic forecasting, where analysts try to understand how shifts such as recessions, automation, or tax reforms might affect demand in different sectors. In this sense, income prediction is not just useful for planning but vital for resilience in uncertain economies.
The dataset used here comes from surveys by the U.S. Census Bureau in 1994 that record demographic, educational, and occupational information together with reported income. Each row represents a real individual with their background, work situation, and income class. The central task is to predict whether someone earns more than 50,000 per year. While this may sound like a simple classification problem, the implications are much deeper. By examining how education, occupation, gender, race, or hours worked affect income, we can better understand the forces that shape economic outcomes for whole communities.
In short, predicting income class is not only a technical challenge but also a social necessity. It has implications for reducing inequality, guiding public policy, improving education and healthcare planning, supporting fairer labor markets, and informing business strategies. Income is tied to nearly every aspect of life, and the ability to predict it responsibly is essential for building a more just and prepared society.
This dataset contains the column
| Column Name | Description |
| ---------------- | ------------------------------------------------------------------------- |
| id | A number to identify each unique records |
| age | Age of the individual |
| workclass | Type of employment (e.g., Private, Self-emp, Government, etc.) |
| fnlwgt | Final weight, used by the Census Bureau to estimate population statistics |
| education | Highest education level achieved |
| education.num | Numerical representation of education level |
| marital.status | Marital status |
| occupation | Occupation type |
| relationship | Relationship status within the household |
| race | Race of the individual |
| sex | Gender of the individual |
| capital.gain | Capital gains reported |
| capital.loss | Capital losses reported |
| ...
Facebook
TwitterIncome of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
Facebook
TwitterDataset used in World Bank Policy Research Working Paper #2876, published in World Bank Economic Review, No. 1, 2005, pp. 21-44.
The effects of globalization on income distribution in rich and poor countries are a matter of controversy. While international trade theory in its most abstract formulation implies that increased trade and foreign investment should make income distribution more equal in poor countries and less equal in rich countries, finding these effects has proved elusive. The author presents another attempt to discern the effects of globalization by using data from household budget surveys and looking at the impact of openness and foreign direct investment on relative income shares of low and high deciles. The author finds some evidence that at very low average income levels, it is the rich who benefit from openness. As income levels rise to those of countries such as Chile, Colombia, or Czech Republic, for example, the situation changes, and it is the relative income of the poor and the middle class that rises compared with the rich. It seems that openness makes income distribution worse before making it better-or differently in that the effect of openness on a country's income distribution depends on the country's initial income level.
Aggregate data [agg]
Facebook
TwitterAdd the following citation to any analysis shared or published:
Department for Work and Pensions (DWP), released 21 March 2024, GOV.UK website, statistical release, Households below average income: for financial years ending 1995 to 2023.
This Households Below Average Income (HBAI) report presents information on living standards in the United Kingdom year on year from financial year ending (FYE) 1995 to FYE 2023.
It provides estimates on the number and percentage of people living in low-income households based on their household disposable income. Figures are also provided for children, pensioners, working-age adults and individuals living in a family where someone is disabled.
Use our infographic to find out how low income is measured in HBAI.
The statistics in this report come from the Family Resources Survey, a representative survey of 25 thousand households in the UK in FYE 2023.
In the 2022 to 2023 HBAI release, one element of the low-income benefits and tax credits Cost of Living Payment was not included, which impacted on the Family Resources based publications and therefore HBAI income estimates for this year.
Revised 2022 to 2023 data has been included in the time series and trend tables in the 2023 to 2024 HBAI release. Stat-Xplore and the underlying dataset has also been updated to reflect the revised 2022 to 2023 data. Please use the data tables in the 2023 to 2024 HBAI release to ensure you have the revised data for 2022 to 2023.
Summary data tables are available on this page, with more detailed analysis available to download as a Zip file.
The directory of tables is a guide to the information in the data tables Zip file.
HBAI data is available from FYE 1995 to FYE 2023 on the https://stat-xplore.dwp.gov.uk/webapi/jsf/login.xhtml">Stat-Xplore online tool. You can use Stat-Xplore to create your own HBAI analysis. Please note that data for FYE 2021 is not available on Stat-Xplore.
HBAI information is available at an individual level, and uses the net, weekly income of their household. Breakdowns allow analysis of individual, family (benefit unit) and household characteristics of the individual.
Read the user guide to HBAI data on Stat-Xplore.
We are seeking feedback from users on the HBAI data in Stat-Xplore: email team.hbai@dwp.gov.uk with your comments.
Facebook
TwitterNASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) Global Land Cover Mapping and Estimation (GLanCE) annual 30 meter (m) Version 1 data product provides global land cover and land cover change data derived from Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI). These maps provide the user community with land cover type, land cover change, metrics characterizing the magnitude and seasonality of greenness of each pixel, and the magnitude of change. GLanCE data products will be provided using a set of seven continental grids that use Lambert Azimuthal Equal Area projections parameterized to minimize distortion for each continent. Currently, North America, South America, Europe, and Oceania are available. This dataset is useful for a wide range of applications, including ecosystem, climate, and hydrologic modeling; monitoring the response of terrestrial ecosystems to climate change; carbon accounting; and land management. The GLanCE data product provides seven layers: the land cover class, the estimated day of year of change, integer identifier for class in previous year, median and amplitude of the Enhanced Vegetation Index (EVI2) in the year, rate of change in EVI2, and the change in EVI2 median from previous year to current year. A low-resolution browse image representing EVI2 amplitude is also available for each granule.Known Issues Version 1.0 of the data set does not include Quality Assurance, Leaf Type or Leaf Phenology. These layers are populated with fill values. These layers will be included in future releases of the data product. * Science Data Set (SDS) values may be missing, or of lower quality, at years when land cover change occurs. This issue is a by-product of the fact that Continuous Change Detection and Classification (CCDC) does not fit models or provide synthetic reflectance values during short periods of time between time segments. * The accuracy of mapping results varies by land cover class and geography. Specifically, distinguishing between shrubs and herbaceous cover is challenging at high latitudes and in arid and semi-arid regions. Hence, the accuracy of shrub cover, herbaceous cover, and to some degree bare cover, is lower than for other classes. * Due to the combined effects of large solar zenith angles, short growing seasons, lower availability of high-resolution imagery to support training data, the representation of land cover at land high latitudes in the GLanCE product is lower than in mid latitudes. * Shadows and large variation in local zenith angles decrease the accuracy of the GLanCE product in regions with complex topography, especially at high latitudes. * Mapping results may include artifacts from variation in data density in overlap zones between Landsat scenes relative to mapping results in non-overlap zones. * Regions with low observation density due to cloud cover, especially in the tropics, and/or poor data density (e.g. Alaska, Siberia, West Africa) have lower map quality. * Artifacts from the Landsat 7 Scan Line Corrector failure are occasionally evident in the GLanCE map product. High proportions of missing data in regions with snow and ice at high elevations result in missing data in the GLanCE SDSs.* The GlanCE data product tends to modestly overpredict developed land cover in arid regions.
Facebook
TwitterLow income cut-offs (LICOs) before and after tax by community size and family size, in current dollars, annual.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset comprises 10,000 artificially generated student essays using GPT4, accompanied by holistic scores ranging from 1 to 6. These essays were generated based on the data from the Automated Essay Scoring 2.0 competition.
My aim was to produce essays that closely resembled those in the original AES dataset, essentially creating paraphrases while ensuring they remained distinct compositions. Equally important was maintaining scores consistent with the original holistic scoring system used in the competition. To accomplish this, I followed the process outlined below:
The basic prompt template looks like this:
prompt_template = ''''
You are a {AGE} year old German student writing an English test, but you're stuck! Luckily, your neighbour is doing well and so you take a glimpse at his sheet and you could catch the following text:
=========
"{TEXT}"
=========
But you cannot simply copy it, you need to change it a bit so the teacher doesn't notice that you copied it,
hence you copy it with the following rules:
- Paraphrase the text just a bit
- Adhere to the style and level of the original text
- Sprinkle some errors into the text, akin to the original
- Remember your age and incroporate that into the essay so it's feasible for a {AGE} year old student who writes not in his native language!
Output only the essay
'''
The produced essay woud be scored the same score as the original essay passed into the {TEXT} variable.
This prompt tries to implement a couple of ideas:
{AGE} variable, I tried to enforce the score of the original essay by prompting essays with a lower score, a lower age (minimum 11, highest 14) and thus also lowering the quality of the produced essay. The formular for the age is defined as: \(age = 15 - (4 - (originalEssayScore // 2))\)spellchecker and added as much random mistakes into the newly generated essays to again, replicate the score as best as I can.Here are some examples:
| New Essay | S |
|---|---|
| In the text "The Excitement of Discovering Mar&s," the writer delivers a strong and effective argument in favor of the idea that studying Mars is a valuable pursuit despite the risks involved. By using facts, data, and current plans in development, the author convinces the reader that exploring Mars is worth the potential dangers. The writer vividly portrays the immersive learning opportunities that could arise from studying the alisen planet, the safe travel c'onditions for humans, and various exploration options to ensure a smooth and secure journey to Mars. |
Initially, the author addresses the perception that Mars is tooy hazardous to explore. Many people are deterred by Mars' reputation as a dangerous and inhospitable planet. The author acknowledges these challenges but demonstrates how safe travel can still be achieved. By detailing Jthe plan proposed by the National Aeronautics and Space Administration (NASA) for astronauts to float above the dangerous conditions, the writer assures the audience of the safety measures in place. Specific aspects of the plan, such as Earth-like air pressure and abundant solar power, are highlighted to emphasize the feasibility of human survival. Drawing a comparison to a blimp-like vehicle, the author simplifies the concept for better understanding. By dispelling the notion of Mars being too perilous, the writer strengthens the argument for explorRing the planet.
Furthermore, the writer emphasizes the educational potential that studying Mars offers. Beyond simple facts about Mars' proximity in size and density to Earth, the author delves into the possibility of Mars once resembling Earth. Describing Mars' current environment as Earth-like with rocky surfaces, valleys, mountains, and craters, the author suggests that Mars may have supported life in the past, similar to Earth. This parallel betwveen the two planets Hcaptivates the audienc...
Facebook
TwitterData files containing detailed information about vehicles in the UK are also available, including make and model data.
Some tables have been withdrawn and replaced. The table index for this statistical series has been updated to provide a full map between the old and new numbering systems used in this page.
The Department for Transport is committed to continuously improving the quality and transparency of our outputs, in line with the Code of Practice for Statistics. In line with this, we have recently concluded a planned review of the processes and methodologies used in the production of Vehicle licensing statistics data. The review sought to seek out and introduce further improvements and efficiencies in the coding technologies we use to produce our data and as part of that, we have identified several historical errors across the published data tables affecting different historical periods. These errors are the result of mistakes in past production processes that we have now identified, corrected and taken steps to eliminate going forward.
Most of the revisions to our published figures are small, typically changing values by less than 1% to 3%. The key revisions are:
Licensed Vehicles (2014 Q3 to 2016 Q3)
We found that some unlicensed vehicles during this period were mistakenly counted as licensed. This caused a slight overstatement, about 0.54% on average, in the number of licensed vehicles during this period.
3.5 - 4.25 tonnes Zero Emission Vehicles (ZEVs) Classification
Since 2023, ZEVs weighing between 3.5 and 4.25 tonnes have been classified as light goods vehicles (LGVs) instead of heavy goods vehicles (HGVs). We have now applied this change to earlier data and corrected an error in table VEH0150. As a result, the number of newly registered HGVs has been reduced by:
3.1% in 2024
2.3% in 2023
1.4% in 2022
Table VEH0156 (2018 to 2023)
Table VEH0156, which reports average CO₂ emissions for newly registered vehicles, has been updated for the years 2018 to 2023. Most changes are minor (under 3%), but the e-NEDC measure saw a larger correction, up to 15.8%, due to a calculation error. Other measures (WLTP and Reported) were less notable, except for April 2020 when COVID-19 led to very few new registrations which led to greater volatility in the resultant percentages.
Neither these specific revisions, nor any of the others introduced, have had a material impact on the statistics overall, the direction of trends nor the key messages that they previously conveyed.
Specific details of each revision made has been included in the relevant data table notes to ensure transparency and clarity. Users are advised to review these notes as part of their regular use of the data to ensure their analysis accounts for these changes accordingly.
If you have questions regarding any of these changes, please contact the Vehicle statistics team.
Overview
VEH0101: https://assets.publishing.service.gov.uk/media/68ecf5acf159f887526bbd7c/veh0101.ods">Vehicles at the end of the quarter by licence status and body type: Great Britain and United Kingdom (ODS, 99.7 KB)
Detailed breakdowns
VEH0103: https://assets.publishing.service.gov.uk/media/68ecf5abf159f887526bbd7b/veh0103.ods">Licensed vehicles at the end of the year by tax class: Great Britain and United Kingdom (ODS, 23.8 KB)
VEH0105: https://assets.publishing.service.gov.uk/media/68ecf5ac2adc28a81b4acfc8/veh0105.ods">Licensed vehicles at
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Diabetes is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. Diabetes is a serious chronic disease in which individuals lose the ability to effectively regulate levels of glucose in the blood, and can lead to reduced quality of life and life expectancy. After different foods are broken down into sugars during digestion, the sugars are then released into the bloodstream. This signals the pancreas to release insulin. Insulin helps enable cells within the body to use those sugars in the bloodstream for energy. Diabetes is generally characterized by either the body not making enough insulin or being unable to use the insulin that is made as effectively as needed.
Complications like heart disease, vision loss, lower-limb amputation, and kidney disease are associated with chronically high levels of sugar remaining in the bloodstream for those with diabetes. While there is no cure for diabetes, strategies like losing weight, eating healthily, being active, and receiving medical treatments can mitigate the harms of this disease in many patients. Early diagnosis can lead to lifestyle changes and more effective treatment, making predictive models for diabetes risk important tools for public and public health officials.
The scale of this problem is also important to recognize. The Centers for Disease Control and Prevention has indicated that as of 2018, 34.2 million Americans have diabetes and 88 million have prediabetes. Furthermore, the CDC estimates that 1 in 5 diabetics, and roughly 8 in 10 prediabetics are unaware of their risk. While there are different types of diabetes, type II diabetes is the most common form and its prevalence varies by age, education, income, location, race, and other social determinants of health. Much of the burden of the disease falls on those of lower socioeconomic status as well. Diabetes also places a massive burden on the economy, with diagnosed diabetes costs of roughly $327 billion dollars and total costs with undiagnosed diabetes and prediabetes approaching $400 billion dollars annually.
The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a csv of the dataset available on Kaggle for the year 2015 was used. This original dataset contains responses from 441,455 individuals and has 330 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
This dataset contains 3 files: 1. diabetes _ 012 _ health _ indicators _ BRFSS2015.csv is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_012 has 3 classes. 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes. There is class imbalance in this dataset. This dataset has 21 feature variables 2. diabetes _ binary _ 5050split _ health _ indicators _ BRFSS2015.csv is a clean dataset of 70,692 survey responses to the CDC's BRFSS2015. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced. 3. diabetes _ binary _ health _ indicators _ BRFSS2015.csv is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is not balanced.
Explore some of the following research questions: 1. Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? 2. What risk factors are most predictive of diabetes risk? 3. Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? 4. Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2015 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.
Zidian Xie et al fo...
Facebook
TwitterThis dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.
Columns: 01. Student_ID: Unique identifier for each student. 02. First_Name: Student’s first name. 03. Last_Name: Student’s last name. 04. Email: Contact email (can be anonymized). 05. Gender: Male, Female, Other. 06. Age: The age of the student. 07. Department: Student's department (e.g., CS, Engineering, Business). 08. Attendance (%): Attendance percentage (0-100%). 09. Midterm_Score: Midterm exam score (out of 100). 10. Final_Score: Final exam score (out of 100). 11. Assignments_Avg: Average score of all assignments (out of 100). 12. Quizzes_Avg: Average quiz scores (out of 100). 13. Participation_Score: Score based on class participation (0-10). 14. Projects_Score: Project evaluation score (out of 100). 15. Total_Score: Weighted sum of all grades. 16. Grade: Letter grade (A, B, C, D, F). 17. Study_Hours_per_Week: Average study hours per week. 18. Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No). 19. Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No). 20. Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD). 21. Family_Income_Level: Low, Medium, High. 22. Stress_Level (1-10): Self-reported stress level (1: Low, 10: High). 23. Sleep_Hours_per_Night: Average hours of sleep per night.
The Attendance is not part of the Total_Score or has very minimal weight.
Calculating the weighted sum: Total Score=a⋅Midterm+b⋅Final+c⋅Assignments+d⋅Quizzes+e⋅Participation+f⋅Projects
| Component | Weight (%) |
|---|---|
| Midterm | 15% |
| Final | 25% |
| Assignments Avg | 15% |
| Quizzes Avg | 10% |
| Participation | 5% |
| Projects Score | 30% |
| Total | 100% |
Dataset contains: - Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level). - Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades). - Imbalanced distributions (e.g., some departments having more students).
Note: - The dataset is real, but I included some bias to create a greater challenge for my students. - Some Columns have been masked as the Data owner requested. "Students_Grading_Dataset_Biased.csv" contains the biased Dataset "Students Performance Dataset" Contains the masked dataset
Facebook
Twitterhttps://www.arcgis.com/home/item.html?id=806c857d504c476ba6477ac475c45bf5https://www.arcgis.com/home/item.html?id=806c857d504c476ba6477ac475c45bf5
Soil map units are the basic geographic unit of the Soil Survey Geographic Database (SSURGO). The SSURGO dataset is a compilation of soils information collected over the last century by the Natural Resources Conservation Service (NRCS). Map units delineate the extent of different soils. Data for each map unit contains descriptions of the soil’s components, productivity, unique properties, and suitability interpretations.Each soil type has a unique combination of physical, chemical, nutrient and moisture properties. Soil type has ramifications for engineering and construction activities, natural hazards such as landslides, agricultural productivity, the distribution of native plant and animal life and hydrologic and other physical processes. Soil types in the context of climate and terrain can be used as a general indicator of engineering constraints, agriculture suitability, biological productivity and the natural distribution of plants and animals.Dataset SummaryPhenomenon Mapped: Ready-to-use project packages with over 170 attributes derived from the SSURGO dataset, split up by HUC8s. Geographic Extent: The dataset covers the 48 contiguous United States plus Hawaii and portions of Alaska. Map packages are available for Puerto Rico and the US Virgin Islands. A project package for US Island Territories and associated states of the Pacific Ocean can be downloaded by clicking one of the included areas in the map. The Pacific Project Package includes: Guam, the Marshall Islands, the Northern Marianas Islands, Palau, the Federated States of Micronesia, and American Samoa.Source: Natural Resources Conservation ServiceUpdate Frequency: AnnualPublication Date: December 2024Link to source metadata*Not all areas within SSURGO have completed soil surveys and many attributes have areas with no data.The soil data in the packages is also available as a feature layer in the ArcGIS Living Atlas of the World.AttributesKey fields from nine commonly used SSURGO tables were compiled to create the 173 attribute fields in this layer. Some fields were joined directly to the SSURGO Map Unit polygon feature class while others required summarization and other processing to create a 1:1 relationship between the attributes and polygons prior to joining the tables. Attributes of this layer are listed below in their order of occurrence in the attribute table and are organized by the SSURGO table they originated from and the processing methods used on them.Map Unit Polygon Feature Class Attribute TableThe fields in this table are from the attribute table of the Map Unit polygon feature class which provides the geographic extent of the map units.Area SymbolSpatial VersionMap Unit SymbolMap Unit TableThe fields in this table have a 1:1 relationship with the map unit polygons and were joined to the table using the Map Unit Key field.Map Unit NameMap Unit KindFarmland ClassInterpretive FocusIntensity of MappingIowa Corn Suitability RatingLegend TableThis table has 1:1 relationship with the Map Unit table and was joined using the Legend Key field.Project ScaleSurvey Area Catalog TableThe fields in this table have a 1:1 relationship with the polygons and were joined to the Map Unit table using the Survey Area Catalog Key and Legend Key fields.Survey Area VersionTabular VersionMap Unit Aggregated Attribute TableThe fields in this table have a 1:1 relationship with the map unit polygons and were joined to the Map Unit attribute table using the Map Unit Key field.Slope Gradient - Dominant ComponentSlope Gradient - Weighted AverageBedrock Depth - MinimumWater Table Depth - Annual MinimumWater Table Depth - April to June MinimumFlooding Frequency - Dominant ConditionFlooding Frequency - MaximumPonding Frequency - PresenceAvailable Water Storage 0-25 cm - Weighted AverageAvailable Water Storage 0-50 cm - Weighted AverageAvailable Water Storage 0-100 cm - Weighted AverageAvailable Water Storage 0-150 cm - Weighted AverageDrainage Class - Dominant ConditionDrainage Class - WettestHydrologic Group - Dominant ConditionIrrigated Capability Class - Dominant ConditionIrrigated Capability Class - Proportion of Map Unit with Dominant ConditionNon-Irrigated Capability Class - Dominant ConditionNon-Irrigated Capability Class - Proportion of Map Unit with Dominant ConditionRating for Buildings without Basements - Dominant ConditionRating for Buildings with Basements - Dominant ConditionRating for Buildings with Basements - Least LimitingRating for Buildings with Basements - Most LimitingRating for Septic Tank Absorption Fields - Dominant ConditionRating for Septic Tank Absorption Fields - Least LimitingRating for Septic Tank Absorption Fields - Most LimitingRating for Sewage Lagoons - Dominant ConditionRating for Sewage Lagoons - Dominant ComponentRating for Roads and Streets - Dominant ConditionRating for Sand Source - Dominant ConditionRating for Sand Source - Most ProbableRating for Paths and Trails - Dominant ConditionRating for Paths and Trails - Weighted AverageErosion Hazard of Forest Roads and Trails - Dominant ComponentHydric Classification - PresenceRating for Manure and Food Processing Waste - Weighted AverageComponent Table – Dominant ComponentMap units have one or more components. To create a 1:1 join component data must be summarized by map unit. For these fields a custom script was used to select the component with the highest value for the Component Percentage Representative Value field (comppct_r). Ties were broken with the Slope Representative Value field (slope_r). Components with lower average slope were selected as dominant. If both soil order and slope were tied, the first value in the table was selected.Component Percentage - Low ValueComponent Percentage - Representative ValueComponent Percentage - High ValueComponent NameComponent KindOther Criteria Used to Identify ComponentsCriteria Used to Identify Components at the Local LevelRunoff ClassSoil loss tolerance factorWind Erodibility IndexWind Erodibility GroupErosion ClassEarth Cover 1Earth Cover 2Hydric ConditionHydric RatingAspect Range - Counter Clockwise LimitAspect - Representative ValueAspect Range - Clockwise LimitGeomorphic DescriptionNon-Irrigated Capability SubclassNon-Irrigated Unit Capability ClassIrrigated Capability SubclassIrrigated Unit Capability ClassConservation Tree Shrub GroupGrain Wildlife HabitatGrass Wildlife HabitatHerbaceous Wildlife HabitatShrub Wildlife HabitatConifer Wildlife HabitatHardwood Wildlife HabitatWetland Wildlife HabitatShallow Water Wildlife HabitatRangeland Wildlife HabitatOpenland Wildlife HabitatWoodland Wildlife HabitatWetland Wildlife HabitatSoil Slip PotentialSusceptibility to Frost HeavingConcrete CorrosionSteel CorrosionTaxonomic ClassTaxonomic OrderTaxonomic SuborderGreat GroupSubgroupParticle SizeParticle Size ModCation Exchange Activity ClassCarbonate ReactionTemperature ClassMoist SubclassSoil Temperature RegimeEdition of Keys to Soil Taxonomy Used to Classify SoilCalifornia Storie IndexComponent KeyComponent Table – Weighted AverageMap units may have one or more soil components. To create a 1:1 join, data from the Component table must be summarized by map unit. For these fields a custom script was used to calculate an average value for each map unit weighted by the Component Percentage Representative Value field (comppct_r).Slope Gradient - Low ValueSlope Gradient - Representative ValueSlope Gradient - High ValueSlope Length USLE - Low ValueSlope Length USLE - Representative ValueSlope Length USLE - High ValueElevation - Low ValueElevation - Representative ValueElevation - High ValueAlbedo - Low ValueAlbedo - Representative ValueAlbedo - High ValueMean Annual Air Temperature - Low ValueMean Annual Air Temperature - Representative ValueMean Annual Air Temperature - High ValueMean Annual Precipitation - Low ValueMean Annual Precipitation - Representative ValueMean Annual Precipitation - High ValueRelative Effective Annual Precipitation - Low ValueRelative Effective Annual Precipitation - Representative ValueRelative Effective Annual Precipitation - High ValueDays between Last and First Frost - Low ValueDays between Last and First Frost - Representative ValueDays between Last and First Frost - High ValueRange Forage Annual Potential Production - Low ValueRange Forage Annual Potential Production - Representative ValueRange Forage Annual Potential Production - High ValueInitial Subsidence - Low ValueInitial Subsidence - Representative ValueInitial Subsidence - High ValueTotal Subsidence - Low ValueTotal Subsidence - Representative ValueTotal Subsidence - High ValueCrop Productivity IndexEsri SymbologyThis field was created to provide symbology based on the Taxonomic Order field (taxorder). Because some map units have a null value for soil order, a custom script was used to populate this field using the Component Name (compname) and Map Unit Name (muname) fields. This field was created using the dominant soil order of each map unit.Esri SymbologyHorizon TableEach map unit polygon has one or more components and each component has one or more layers known as horizons. To incorporate this field from the Horizon table into the attributes for this layer, a custom script was used to first calculate the mean value weighted by thickness of the horizon for each component and then a mean value of components weighted by the Component Percentage Representative Value field for each map unit. K-Factor Rock FreeEsri Soil OrderThese fields were calculated from the Component table using a model that included the Pivot Table Tool, the Summarize Tool and a custom script. The first 11 fields provide the sum of Component Percentage Representative Value for each soil order for each map unit. The Soil Order Dominant Condition field was calculated by selecting the highest value in the preceding 11 soil order fields. In the case of tied values the component with the lowest average slope value (slope_r) was selected. If both soil order and slope were tied
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains extensive health information for 2,149 patients, each uniquely identified with IDs ranging from 4751 to 6900. The dataset includes demographic details, lifestyle factors, medical history, clinical measurements, cognitive and functional assessments, symptoms, and a diagnosis of Alzheimer's Disease. The data is ideal for researchers and data scientists looking to explore factors associated with Alzheimer's, develop predictive models, and conduct statistical analyses.
This dataset offers extensive insights into the factors associated with Alzheimer's Disease, including demographic, lifestyle, medical, cognitive, and functional variables. It is ideal for developing predictive models, conducting statistical analyses, and exploring the complex interplay of factors contributing to Alzheimer's Disease.
If you use this dataset in your work, please cite it as follows:
@misc{rabie_el_kharoua_2024,
title={Alzheimer's Disease Dataset},
url={https://www.kaggle.com/dsv/8668279},
DOI={10.34740/KAGGLE/DSV/8668279},
publisher={Kaggle...
Facebook
TwitterThe Family Resources Survey (FRS) has been running continuously since 1992 to meet the information needs of the Department for Work and Pensions (DWP). It is almost wholly funded by DWP.
The FRS collects information from a large, and representative sample of private households in the United Kingdom (prior to 2002, it covered Great Britain only). The interview year runs from April to March.
The focus of the survey is on income, and how much comes from the many possible sources (such as employee earnings, self-employed earnings or profits from businesses, and dividends; individual pensions; state benefits, including Universal Credit and the State Pension; and other sources such as savings and investments). Specific items of expenditure, such as rent or mortgage, Council Tax and water bills, are also covered.
Many other topics are covered and the dataset has a very wide range of personal characteristics, at the adult or child, family and then household levels. These include education, caring, childcare and disability. The dataset also captures material deprivation, household food security and (new for 2021/22) household food bank usage.
The FRS is a national statistic whose results are published on the gov.uk website. It is also possible to create your own tables from FRS data, using DWP’s Stat Xplore tool. Further information can be found on the gov.uk Family Resources Survey webpage.
Secure Access FRS data
In addition to the standard End User Licence (EUL) version, Secure Access datasets, containing unrounded data and additional variables, are also available for FRS from 2005/06 onwards - see SN 9256. Prospective users of the Secure Access version of the FRS will need to fulfil additional requirements beyond those associated with the EUL datasets. Full details of the application requirements are available from http://ukdataservice.ac.uk/media/178323/secure_frs_application_guidance.pdf" style="background-color: rgb(255, 255, 255);">Guidance on applying for the Family Resources Survey: Secure Access.
FRS, HBAI and PI
The FRS underpins the related Households Below Average Income (HBAI) dataset, which focuses on poverty in the UK, and the related Pensioners' Incomes (PI) dataset. The EUL versions of HBAI and PI are held under SNs 5828 and 8503, respectively. The Secure Access versions are held under SN 7196 and 9257 (see above).
Facebook
TwitterReported DCMS Sector GVA is estimated to have fallen by 0.4% from Quarter 2 (April to June) to Quarter 3 2022 (July to September) in real terms. By comparison, the whole UK economy fell by 0.2% from Quarter 2 to Quarter 3 2022.
GVA of reported DCMS Sectors in September 2022 was 6% above February 2020 levels, which was the most recent month not significantly affected by the pandemic. By comparison, GVA for the whole UK economy was 0.2% lower than in February 2020.
16 November 2022
These Economic Estimates are Official Statistics used to provide an estimate of the economic contribution of DCMS Sectors in terms of gross value added (GVA), for the period January 2019 to September 2022. Provisional monthly GVA in 2019 and 2020 was first published in March 2021 as an ad hoc statistical release. This current release contains new figures for July to September 2022 and revised estimates for previous months, in line with the scheduled revisions that were made to the underlying ONS datasets in October 2022.
Estimates are in chained volume measures (i.e. have been adjusted for inflation), at 2019 prices, and are seasonally adjusted. These latest monthly estimates should only be used to illustrate general trends, not used as definitive figures.
You can use these estimates to:
You should not use these estimates to:
Estimates of annual GVA by DCMS Sectors, based on the monthly series, are included in this release for 2019 to 2021. These are calculated by summing the monthly estimates for the calendar year and were first published for 2019 and 2020 in DCMS Sector National Economic Estimates: 2011 - 2020.
Since August 2022, we have been publishing these estimates as part of the regular published series of GVA data, with data being revised in line with revisions to the underlying ONS datasets, as with the monthly GVA estimates. These estimates have been published, updating what was first published last year, in order to meet growing demand for annual figures for GVA beyond the 2019 estimates in our National Statistics GVA publication. The National Statistics GVA publication estimates remain the most robust for our sectors, however estimates for years after 2019 have been delayed owing to the coronavirus (COVID-19) pandemic.
Consequently, these “summed monthly” annual estimate figures for GVA can be used but should not be seen as definitive.
The findings are calculated based on published ONS data sources including the Index of Services and Index of Production.
These data sources provide an estimate of the monthly change in GVA for all UK industries. However, the data is only available for broader industry groups, whereas DCMS sectors are defined at a more detailed industrial level. For example, GVA for ‘Cultural education’ is estimated based on the trend for all education. Sectors such as ‘Cultural education’ may have been affected differently by COVID-19 compared to education in general. These estimates are also based on the composition of the economy in 2019. Overall, this means the accuracy of monthly GVA for DCMS sectors is likely to be lower for months in 2020 and 2021.
The technical guidance contains further information about data sources, methodology, and the validation and accuracy of these estimates.
Figures are provisional and subject to revision on a monthly basis when the ONS Index of Services and Index of Production are updated. Figures for the latest month will be highly uncertain.
An example of the impact of these revisions is highlighted in the following example; for the revisions applied in February 2022 the average change to DCMS sector monthly GVA was 0.6%, but there were larger differences for some sectors, in some months e.g. the value of the Sport sector in May 2021 was revised from £1.27 billion to £1.45 billion, a 13.8% difference.
<h2
Facebook
Twitterhttps://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in Missouri per the most current US Census data, including information on rank and average income.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Income is one of the clearest indicators of a person’s position in society. It reflects financial stability and determines the quality of life that someone can achieve. Higher income often means better access to education, healthcare, safe housing, and opportunities for growth. Lower income, in contrast, is closely linked with social vulnerability, reduced chances of mobility, and cycles of poverty that can continue across generations. For this reason, income is not just a private measure but a key element that governments, economists, and social researchers study to understand inequality.
The ability to predict income class is crucial because it highlights the factors that drive differences in earnings. With accurate prediction, governments can identify which groups are most at risk of being left behind and take action before inequality grows wider. For instance, if the data shows that certain occupations or education levels are strongly tied to low earnings, policies can be designed to expand training programs or improve access to education. Prediction therefore makes it possible to respond proactively to social and economic challenges.
The urgency of income prediction is also seen in the education sector. Knowing how income relates to qualifications allows schools, universities, and training centers to design curricula that prepare students for careers with stronger earning potential. This is one way to reduce the mismatch between education and the labor market, while also giving individuals a better chance to move up economically. In healthcare and housing, income is a strong predictor of access and demand. By predicting income levels, governments and organizations can anticipate needs for insurance support, medical subsidies, or public housing and allocate resources more effectively.
Businesses depend heavily on income insights as well. Since income shapes purchasing power, predicting income groups helps companies design products, set fair pricing, and segment their markets. Accurate income prediction also plays a role in economic forecasting, where analysts try to understand how shifts such as recessions, automation, or tax reforms might affect demand in different sectors. In this sense, income prediction is not just useful for planning but vital for resilience in uncertain economies.
The dataset used here comes from surveys by the U.S. Census Bureau in 1994 that record demographic, educational, and occupational information together with reported income. Each row represents a real individual with their background, work situation, and income class. The central task is to predict whether someone earns more than 50,000 per year. While this may sound like a simple classification problem, the implications are much deeper. By examining how education, occupation, gender, race, or hours worked affect income, we can better understand the forces that shape economic outcomes for whole communities.
In short, predicting income class is not only a technical challenge but also a social necessity. It has implications for reducing inequality, guiding public policy, improving education and healthcare planning, supporting fairer labor markets, and informing business strategies. Income is tied to nearly every aspect of life, and the ability to predict it responsibly is essential for building a more just and prepared society.
This dataset contains the column
| Column Name | Description |
| ---------------- | ------------------------------------------------------------------------- |
| id | A number to identify each unique records |
| age | Age of the individual |
| workclass | Type of employment (e.g., Private, Self-emp, Government, etc.) |
| fnlwgt | Final weight, used by the Census Bureau to estimate population statistics |
| education | Highest education level achieved |
| education.num | Numerical representation of education level |
| marital.status | Marital status |
| occupation | Occupation type |
| relationship | Relationship status within the household |
| race | Race of the individual |
| sex | Gender of the individual |
| capital.gain | Capital gains reported |
| capital.loss | Capital losses reported |
| ...