16 datasets found
  1. Current Population Survey August 2016 (Adult)

    • kaggle.com
    zip
    Updated May 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Reid (2019). Current Population Survey August 2016 (Adult) [Dataset]. https://www.kaggle.com/evanreid77/current-population-survey-august-2016-adult
    Explore at:
    zip(439643 bytes)Available download formats
    Dataset updated
    May 9, 2019
    Authors
    Evan Reid
    Description

    Find the column descriptions and valid entries here: https://github.com/EvanReid88/Current-Population-Survey-Data-Science-Project

    The Current Population Survey is one of the oldest, largest, and most recognized surveys in the United States. This survey provides information about individuals in society such as work, earnings, and our education. The CPS is used to collect data for a variety of other studies that keep the nation informed of the economic and social well-being of its people. The August 2016 CPS dataset can be found in a raw .CSV format on Kaggle.com via the following link:

    https://www.kaggle.com/census/current-population-survey Kaggle does not provide an accurate data description for the data (but somebody found the correct dict in the comments), so find the real data dict for the original dataset here

    My goal was to create a dataset that is much like the UCI Adult dataset (from 1995):

    http://mlr.cs.umass.edu/ml/datasets/Adult

    The 2016 CPS dataset has many columns and many incomplete survey answers, so I edited the to capture the most relevant data. It is important to note that I have stripped the data of any multiple-job holding individuals, so the data only reflects individuals with a single occupation. This updated version of the 2016 CPS dataset is very similar to the UCI adult dataset with some added columns.

    The Current Population Survey (CPS) is administered, processed, researched and disseminated by the U.S. Census Bureau on behalf of the Bureau of Labor Statistics (BLS).

  2. b

    Percentage of adults with learning disabilities in paid employment - WMCA

    • cityobservatory.birmingham.gov.uk
    csv, excel, geojson +1
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Percentage of adults with learning disabilities in paid employment - WMCA [Dataset]. https://cityobservatory.birmingham.gov.uk/explore/dataset/percentage-of-adults-with-learning-disabilities-in-paid-employment-wmca/
    Explore at:
    csv, geojson, excel, jsonAvailable download formats
    Dataset updated
    Nov 3, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The measure shows the proportion of all adults (aged 18-64) with a learning disability who are known to the council, who are recorded as being in paid employment. The definition of individuals 'known to the council' is restricted to those adults with a learning disability (with a primary client group of LD) who have been assessed or reviewed by the council during the year (irrespective of whether or not they receive a service) or who should have been reviewed but were not. The measure is focused on 'paid' employment, to be clear that voluntary work is to be excluded for the purposes of this measure. Paid employment is measured using the following two categories: Working as a paid employee or self-employed (16 or more hours per week); and, Working as a paid employee or self-employed (up to 16 hours per week). A 'paid employee' is one who works for a company, community or voluntary organisation, council or other organisation and is earning at or above the National Minimum Wage. This includes those who are working in supported employment (i.e. those receiving support from a specialist agency to maintain their job) who are earning at or above the National Minimum Wage. 'Self-employed' is defined as those who work for themselves and generally pay their National Insurance themselves. This should also include those who are unpaid family workers (i.e. those who do unpaid work for a business they own or for a business a relative owns). In 2014/15 the change from ASC-CAR to SALT resulted in a change to who is included in the measure. Previously, this measure included 'all adults with a learning disability who are known to the council. However, SALT table LTS001a only captures those clients who have received a long-term service in the reporting year. Furthermore, the measure now only draws on the subset of these clients who have a primary support reason of Learning Disability Support; those clients who may previously have been included in the client group Learning Disability in ASC-CAR might not have a primary support reason of Learning Disability Support, and are now excluded from the measure. Only covers people receiving partly or wholly supported care from their Local Authority and not wholly private, self-funded care. Data source: SALT. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.

  3. Adult income is over $50,000 a year.

    • kaggle.com
    zip
    Updated Oct 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Atif Latif (2024). Adult income is over $50,000 a year. [Dataset]. https://www.kaggle.com/datasets/matiflatif/adult-income-is-over-50000-a-year
    Explore at:
    zip(724624 bytes)Available download formats
    Dataset updated
    Oct 16, 2024
    Authors
    M Atif Latif
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context

    This dataset contains information about individuals' demographic and employment attributes to predict whether their income exceeds $50,000 per year. It originates from the 1994 U.S. Census database and has been widely used in classification problems, making it an excellent resource for machine learning, data analysis, and statistical modeling.

    Content

    The dataset includes various features related to personal and work-related attributes. The target variable is whether an individual's income exceeds $50,000 annually.

    Key features include:

    Age: Age of the individual.

    Workclass: Employment type (e.g., private, government, self-employed).

    Education: Highest level of education achieved.

    Education-Num: Number corresponding to the level of education.

    Marital Status: Marital status of the individual.

    Occupation: Profession or job role.

    Relationship: Family role (e.g., husband, wife, not in family).

    Race: Race of the individual.

    Sex: Gender of the individual.

    Capital Gain: Income from investment sources other than salary.

    Capital Loss: Losses from investment sources.

    Hours Per Week: Average number of hours worked per week.

    Native Country: Country of origin of the individual

    Variables

    Age: Continuous variable representing the age of the individual.

    Workclass: Categorical variable indicating the type of employment (e.g., Private, Self-Employed, Government).

    Education: Categorical variable showing the highest level of education achieved (e.g., Bachelors, Masters).

    Education-Num: Numerical representation of the education level.

    Marital Status: Categorical variable representing marital status (e.g., Married, Never-Married).

    Occupation: Categorical variable indicating the job role or occupation

    Relationship: Categorical variable describing the family relationship (e.g., Husband, Wife).

    Race: Categorical variable showing the race of the individual.

    Sex: Categorical variable indicating the gender of the individual.

    Capital Gain: Continuous variable representing income from capital gains.

    Capital Loss: Continuous variable representing losses from investments.

    Hours Per Week: Continuous variable showing the average working hours per week.

    Native Country: Categorical variable indicating the country of origin.

    Income: Target variable (binary), indicating whether the individual earns more than $50,000 (>50K) or not (<=50K).

    Acknowledgements

    This dataset was derived from the 1994 U.S. Census database and has been made publicly available for research and educational purposes. It is not affiliated with any specific organization. Users are encouraged to comply with ethical data usage guidelines while working with this dataset.

  4. e

    Sorting table of the Employment Development Agency

    • data.europa.eu
    unknown
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archives nationales de Luxembourg (2024). Sorting table of the Employment Development Agency [Dataset]. https://data.europa.eu/data/datasets/tableau-de-tri-de-lagence-pour-le-developpement-de-lemploi?locale=en
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Sep 9, 2024
    Dataset authored and provided by
    Archives nationales de Luxembourg
    Description

    _ Sorting table based on the ArcategTM repository. Agreement drafted in French and signed by hand on 03/12/2019 by the Director of ADEM and the Director of ANLux._

    History of administration:

    The management of jobseekers in the Grand Duchy of Luxembourg dates back to the end of the 19th century with the creation of labour exchanges in Luxembourg, Esch-sur-Alzette and Diekirch. There was no clear policy on the employment and management of the unemployed before the Act of 2 May 1913 governing the action of employment offices. After an ephemeral Central Placement Office created under the German occupation on 13 July 1940, it was not until the end of the Second World War that the first sustainable central and state administration for employment management emerged. The National Labour Office, which takes over the management of the labour exchanges, is entrusted with this task by the Grand-Ducal Decree of 30 June 1945. The Office was replaced in 1976 by the Employment Administration (ADEM), which was reformed in 2012 and renamed the Employment Development Agency.

    Principal missions:

    ADEM is the public employment service of the Grand Duchy of Luxembourg whose mission is to promote employment by strengthening the capacity to steer employment policy in coordination with economic and social policy. ADEM’s clients are jobseekers and employers. In the context of career guidance, ADEM also advises secondary school pupils.

    In order to carry out this task, the Agency shall have the following powers: - Accompanying, advising, guiding and helping people looking for a job - To contribute to the security of employees' career paths - Coordinating and organising the training of jobseekers with a view to increasing their professional skills in collaboration with bodies which have vocational training in their remit - Prospecting the labour market, collecting job vacancies, assisting and advising employers in their recruitment - To ensure that job vacancies and applications are matched - Ensure the enforcement of legislation concerning the prevention of unemployment, the reduction of unemployment, the granting of unemployment benefits and employment support - To intervene in the retraining and re-employment of the workforce - Contribute to the implementation of legislation on the restoration of full employment - Organise apprenticeship placements for young people and adults - Provide career guidance for the integration or reintegration of young people and adults into working life - Contribute to the development and management of youth employment measures - Promote female employment, in particular as regards access to employment - Provide guidance, training, rehabilitation, professional integration and reintegration and follow-up for employees with disabilities and employees with reduced working capacity - Monitor and analyse the situation and developments on the labour market - To ensure technical relations with similar foreign and international services

    Regulatory references:

    • Labour Code

    Versions and updates:

    The following shall be published in the dataset: - The first version signed on 03/12/2019

  5. C

    Employment and Unemployment

    • data.ccrpc.org
    csv
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Champaign County Regional Planning Commission (2024). Employment and Unemployment [Dataset]. https://data.ccrpc.org/dataset/employment-and-unemployment
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset authored and provided by
    Champaign County Regional Planning Commission
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The employment and unemployment indicator shows several data points. The first figure is the number of people in the labor force, which includes the number of people who are either working or looking for work. The second two figures, the number of people who are employed and the number of people who are unemployed, are the two subcategories of the labor force. The unemployment rate is a calculation of the number of people who are in the labor force and unemployed as a percentage of the total number of people in the labor force.

    The unemployment rate does not include people who are not employed and not in the labor force. This includes adults who are neither working nor looking for work. For example, full-time students may choose not to seek any employment during their college career, and are thus not considered in the unemployment rate. Stay-at-home parents and other caregivers are also considered outside of the labor force, and therefore outside the scope of the unemployment rate.

    The unemployment rate is a key economic indicator, and is illustrative of economic conditions in the county at the individual scale.

    There are additional considerations to the unemployment rate. Because it does not count those who are outside the labor force, it can exclude individuals who were looking for a job previously, but have since given up. The impact of this on the overall unemployment rate is difficult to quantify, but it is important to note because it shows that no statistic is perfect.

    The unemployment rates for Champaign County, the City of Champaign, and the City of Urbana are extremely similar between 2000 and 2023.

    All three areas saw a dramatic increase in the unemployment rate between 2006 and 2009. The unemployment rates for all three areas decreased overall between 2010 and 2019. However, the unemployment rate in all three areas rose sharply in 2020 due to the effects of the COVID-19 pandemic. The unemployment rate in all three areas dropped again in 2021 as pandemic restrictions were removed, and were almost back to 2019 rates in 2022. However, the unemployment rate in all three areas rose slightly from 2022 to 2023.

    This data is sourced from the Illinois Department of Employment Security’s Local Area Unemployment Statistics (LAUS), and from the U.S. Bureau of Labor Statistics.

    Sources: Illinois Department of Employment Security, Local Area Unemployment Statistics (LAUS); U.S. Bureau of Labor Statistics.

  6. Tax Credits Recipients, Borough - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). Tax Credits Recipients, Borough - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/tax-credits-recipients-borough
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Child Tax Credit (CTC) provides support to families for the children (up to the 31 August after their 16th birthdays) and the "qualifying" young people (those in full-time non-advanced education until their 20th birthdays) for which they are responsible. It is paid in addition to Child Benefit. Some out of work families with children do not receive CTC but instead receive the equivalent amount via child and related allowances in Income Support or income-based Jobseeker's Allowance (IS/JSA). These families are included in the figures, generally together with out of work families receiving CTC. In due course, they will be "migrated" to tax credits. Working Tax Credit (WTC) tops up the earnings of families on low or moderate incomes. People working for at least 16 hours a week can claim it if they (a) are responsible for at least one child or qualifying young person, (b) have a disability which puts them at a disadvantage in getting a job or (c) in the first year of work, having returned to work aged at least 50 after a period of at least six months receiving out-of-work benefits. Other adults qualify if they are aged at least 25 and work for at least 30 hours a week. Ward data available in the Ward profiles. https://www.gov.uk/government/collections/personal-tax-credits-statistics

  7. Loss of Work Due to Illness from COVID-19

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Apr 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). Loss of Work Due to Illness from COVID-19 [Dataset]. https://catalog.data.gov/dataset/loss-of-work-due-to-illness-from-covid-19
    Explore at:
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    The Research and Development Survey (RANDS) is a platform designed for conducting survey question evaluation and statistical research. RANDS is an ongoing series of surveys from probability-sampled commercial survey panels used for methodological research at the National Center for Health Statistics (NCHS). RANDS estimates are generated using an experimental approach that differs from the survey design approaches generally used by NCHS, including possible biases from different response patterns and sampling frames as well as increased variability from lower sample sizes. Use of the RANDS platform allows NCHS to produce more timely data than would be possible using traditional data collection methods. RANDS is not designed to replace NCHS’ higher quality, core data collections. Below are experimental estimates of loss of work due to illness with coronavirus for three rounds of RANDS during COVID-19. Data collection for the three rounds of RANDS during COVID-19 occurred between June 9, 2020 and July 6, 2020, August 3, 2020 and August 20, 2020, and May 17, 2021 and June 30, 2021. Information needed to interpret these estimates can be found in the Technical Notes. RANDS during COVID-19 included a question about the inability to work due to being sick or having a family member sick with COVID-19. The National Health Interview Survey, conducted by NCHS, is the source for high-quality data to monitor work-loss days and work limitations in the United States. For example, in 2018, 42.7% of adults aged 18 and over missed at least 1 day of work in the previous year due to illness or injury and 9.3% of adults aged 18 to 69 were limited in their ability to work or unable to work due to physical, mental, or emotional problems. The experimental estimates on this page are derived from RANDS during COVID-19 and show the percentage of U.S. adults who did not work for pay at a job or business, at any point, in the previous week because either they or someone in their family was sick with COVID-19. Technical Notes: https://www.cdc.gov/nchs/covid19/rands/work.htm#limitations

  8. Policy Radar - At Risk Jobs

    • data-insight-tfwm.hub.arcgis.com
    Updated Nov 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Transport for West Midlands (2021). Policy Radar - At Risk Jobs [Dataset]. https://data-insight-tfwm.hub.arcgis.com/documents/245cd30e9f9148869c80eaa0db3c26be
    Explore at:
    Dataset updated
    Nov 24, 2021
    Dataset authored and provided by
    Transport for West Midlandshttp://www.tfwm.org.uk/
    Description

    Utilising a regression analysis we created a correlation matrix utilising a number of demographic indicators from the Local Insight platform. This application is showing the distribution of the datasets that were found to have the strongest relationships, with the base comparison dataset of proportion of jobs at risk following the outbreak of COVID-19. This app contains the following datasets: proportion of jobs in education, proportion of jobs in accommodation and food services, proportion of jobs in health, proportion of business enterprises with a turnover of £1M to £4.9M, proportion of business enterprises with a turnover of £5M or more, proportion of jobs in manufacturing, proportion of working age adults who are economically active, proportion of adults who are economically inactive, proportion of people who identify their ethnicity as white and proportion of the male population aged 15 to 19.

  9. BRFSS 2020 Heart Disease Dataset(Cleaned Version)

    • zenodo.org
    csv
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koushal Kumar; BP Pande; Koushal Kumar; BP Pande (2025). BRFSS 2020 Heart Disease Dataset(Cleaned Version) [Dataset]. http://doi.org/10.5281/zenodo.15336526
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Koushal Kumar; BP Pande; Koushal Kumar; BP Pande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".

    To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

    1. Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)

    2. Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)

    3. Unhealthy habits:

      • Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
      • Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
    4. General Health:

      • Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
      • Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
      • Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
      • Physical Health - number of days being physically ill or injured (0-30 days)
      • Mental Health - number of days having bad mental health (0-30 days)
      • General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

    Below is a description of the features collected for each patient:

    #FeatureCoded Variable NameDescription
    1HeartDiseaseCVDINFR4Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI)
    2BMI_BMI5CATBody Mass Index (BMI)
    3Smoking_SMOKER3Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]
    4AlcoholDrinking_RFDRHV7Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
    5StrokeCVDSTRK3(Ever told) (you had) a stroke?
    6PhysicalHealthPHYSHLTHNow thinking about your physical health, which includes physical illness and injury, for how many days during the past 30
    7MentalHealthMENTHLTHThinking about your mental health, for how many days during the past 30 days was your mental health not good?
    8DiffWalkingDIFFWALKDo you have serious difficulty walking or climbing stairs?
    9SexSEXVARAre you male or female?
    10AgeCategory_AGE_G,Fourteen-level age category
    11Race_IMPRACEImputed race/ethnicity value
    12DiabeticDIABETE4(Ever told) (you had) diabetes?
    13PhysicalActivityEXERANY2Adults who reported doing physical activity or exercise during the past 30 days other than their regular job
    14GenHealthGENHLTHWould you say that in general your health is...
    15SleepTimeSLEPTIM1On average, how many hours of sleep do you get in a 24-hour period?
    16AsthmaCHASTHMA(Ever told) (you had) asthma?
    17KidneyDiseaseCHCKDNY2Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease?
    18SkinCancerCHCSCNCR(Ever told) (you had) skin cancer?
  10. Z

    Dataset for: "Big data suggest strong constraints of linguistic similarity...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Job Schepens; Roeland van Hout; T. Florian Jaeger (2020). Dataset for: "Big data suggest strong constraints of linguistic similarity on adult language learning" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_2863532
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    University of Rochester
    Freie Universitaet Berlin
    Radboud Universiteit
    Authors
    Job Schepens; Roeland van Hout; T. Florian Jaeger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is adapted from raw data with fully anonymized results on the State Examination of Dutch as a Second Language. This exam is officially administred by the Board of Tests and Examinations (College voor Toetsen en Examens, or CvTE). See cvte.nl/about-cvte. The Board of Tests and Examinations is mandated by the Dutch government.

    The article accompanying the dataset:

    Schepens, Job, Roeland van Hout, and T. Florian Jaeger. “Big Data Suggest Strong Constraints of Linguistic Similarity on Adult Language Learning.” Cognition 194 (January 1, 2020): 104056. https://doi.org/10.1016/j.cognition.2019.104056.

    Every row in the dataset represents the first official testing score of a unique learner. The columns contain the following information as based on questionnaires filled in at the time of the exam:

    "L1" - The first language of the learner "C" - The country of birth "L1L2" - The combination of first and best additional language besides Dutch "L2" - The best additional language besides Dutch "AaA" - Age at Arrival in the Netherlands in years (starting date of residence) "LoR" - Length of residence in the Netherlands in years "Edu.day" - Duration of daily education (1 low, 2 middle, 3 high, 4 very high). From 1992 until 2006, learners' education has been measured by means of a side-by-side matrix question in a learner's questionnaire. Learners were asked to mark which type of education they have had (elementary, secondary, or tertiary schooling) by means of filling in for how many years they have been enrolled, in which country, and whether or not they have graduated. Based on this information we were able to estimate how many years learners have had education on a daily basis from six years of age onwards. Since 2006, the question about learners' education has been altered and it is asked directly how many years learners have had formal education on a daily basis from six years of age onwards. Possible answering categories are: 1) 0 thru 5 years; 2) 6 thru 10 years; 3) 11 thru 15 years; 4) 16 years or more. The answers have been merged into the categorical answer. "Sex" - Gender "Family" - Language Family "ISO639.3" - Language ID code according to Ethnologue "Enroll" - Proportion of school-aged youth enrolled in secondary education according to the World Bank. The World Bank reports on education data in a wide number of countries around the world on a regular basis. We took the gross enrollment rate in secondary schooling per country in the year the learner has arrived in the Netherlands as an indicator for a country's educational accessibility at the time learners have left their country of origin. "STEX_speaking_score" - The STEX test score for speaking proficiency. "Dissimilarity_morphological" - Morphological similarity "Dissimilarity_lexical" - Lexical similarity "Dissimilarity_phonological_new_features" - Phonological similarity (in terms of new features) "Dissimilarity_phonological_new_categories" - Phonological similarity (in terms of new sounds)

    A few rows of the data:

    "L1","C","L1L2","L2","AaA","LoR","Edu.day","Sex","Family","ISO639.3","Enroll","STEX_speaking_score","Dissimilarity_morphological","Dissimilarity_lexical","Dissimilarity_phonological_new_features","Dissimilarity_phonological_new_categories" "English","UnitedStates","EnglishMonolingual","Monolingual",34,0,4,"Female","Indo-European","eng ",94,541,0.0094,0.083191,11,19 "English","UnitedStates","EnglishGerman","German",25,16,3,"Female","Indo-European","eng ",94,603,0.0094,0.083191,11,19 "English","UnitedStates","EnglishFrench","French",32,3,4,"Male","Indo-European","eng ",94,562,0.0094,0.083191,11,19 "English","UnitedStates","EnglishSpanish","Spanish",27,8,4,"Male","Indo-European","eng ",94,537,0.0094,0.083191,11,19 "English","UnitedStates","EnglishMonolingual","Monolingual",47,5,3,"Male","Indo-European","eng ",94,505,0.0094,0.083191,11,19

  11. CalWORKs Welfare-to-Work Monthly Activities

    • data.ca.gov
    • data.chhs.ca.gov
    • +3more
    csv, docx, zip
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Social Services (2025). CalWORKs Welfare-to-Work Monthly Activities [Dataset]. https://data.ca.gov/dataset/calworks-welfare-to-work-monthly-activities
    Explore at:
    csv, docx, zipAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    California Department of Social Serviceshttp://www.cdss.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CalWORKs Welfare-to-Work (WTW) Monthly Activity Report. These data provide detailed counts of CalWORKs WTW enrollees, including exemptions and the type of activity individuals are enrolled in as part of their WTW plan requirements. In addition to activities, the data include information on clients' nonparticipation status, supportive service participation, and postemployment/job-retention service participation. These items are categorized by adults in two-parent families, adults in all other families, as well as a grand total. See the WTW 25 and WTW 25A reports posted on the California Department of Social Services' Research and Data Reports homepage (http://www.cdss.ca.gov/inforesources/Research-and-Data/CalWORKs-Data-Tables) for more information.

  12. Further education and skills - Further education and skills subject - free...

    • explore-education-statistics.service.gov.uk
    Updated Jan 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2022). Further education and skills - Further education and skills subject - free courses for jobs summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/aa368e3a-22d4-4a98-ac0e-abcee1cfceb0
    Explore at:
    Dataset updated
    Jan 27, 2022
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Free courses for jobs summaryAcademic years: 2020/21 to 2021/22 full academic yearsIndicators: Enrolments by adults taking up Free Courses for Jobs (under original offer), Enrolments by adults taking up Free Courses for Jobs (including under extended offer from Apr22), Enrolments by eligible adults on offer courses (under original offer), Enrolments by any adult on offer courses, Number of courses available during period under Free Courses for Jobs offerFilter: Start age, Start month

  13. income_dataset_ITU_project

    • kaggle.com
    zip
    Updated May 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cem Sari (2021). income_dataset_ITU_project [Dataset]. https://www.kaggle.com/datasets/cemsari/income-dataset-itu-project
    Explore at:
    zip(299731 bytes)Available download formats
    Dataset updated
    May 1, 2021
    Authors
    Cem Sari
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Projede kullanılan dataset UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Adult)'den elde edilmiştir. Data Set 14 attributes halinde insanların demografik bilgilerinden oluşmaktadır.

    -age : Yaş

    -workclass : Çalışma şekli.Attributes: Federal-gov, Local-gov, Never-worked, Private, Self-emp-inc, Self-emp-not-inc, State-gov, and Without-pay.

    -education : Eğitim seviyesi.Sıralanmış eğtim seviyeri attibutes: Preschool < 1st-4th < 5th-6th < 7th-8th < 9th < 10th < 11th < 12th < HS-grad < Prof-school < Assoc-acdm < Assoc-voc < Some-college < Bachelors < Masters < Doctorate.

    -education_num : Eğitim seviyelerinin nümerik formu

    -marital_status : Evlilik durumu. Evlilik durumu attributes: Divorced, Married-AF-spouse, Married-civ-spouse, Married-spouse-absent, Never-married, Separated, and Widowed.

    -occupation : İşi.İş attributes: Adm-clerical, Armed-Forces, Craft-repair, Exec-managerial, Farming-fishing, Handlers-cleaners, Machine-op-inspct, Other-service, Priv-house-serv, Prof-specialty, Protective-serv, Sales, Tech-support, and Transport-moving.

    -relationship : Aile durumu.Aile durumu attributes: Husband, Not-in-family, Other-relative, Own-child, Unmarried, and Wife.

    -race : Irkı.Irk attributes: Amer-Indian-Eskimo, Asian-Pac-Islander, Black, Other, and White.

    -sex : Cinsiyeti.Cinsiyet attribute. The levels of the attributes: Female and Male.

    -capital_gain : Değer artışı.

    -capital_loss : Değer azaışı.

    -hours_per_week : Haftalık çalışma saati.

    -native_country – Menşei.Menşei attributes: Cambodia, Canada, China, Columbia, Cuba, Dominican-Republic, Ecuador, El-Salvador, England, France, Germany, Greece, Guatemala, Haiti, Holand-Netherlands, Honduras, Hong, Hungary, India, Iran, Ireland, Italy, Jamaica, Japan, Laos, Mexico, Nicaragua, Outlying-US(Guam-USVI-etc), Peru, Philippines, Poland, Portugal, Puerto-Rico, Scotland, South, Taiwan, Thailand, Trinadad&Tobago, United-States, Vietnam,Yugoslavia.

    -income –Yıldaki $50000 kazanım durum.Attributes: <=50K-0 ve >50K-1.

  14. HackStory Problem 1

    • kaggle.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    archit lahiri (2023). HackStory Problem 1 [Dataset]. https://www.kaggle.com/datasets/architlahiri/hackstory-problem-1
    Explore at:
    zip(767848 bytes)Available download formats
    Dataset updated
    Jun 3, 2023
    Authors
    archit lahiri
    Description

    Problem Statement 1: Design an ML-based Income Range Predictor with Bias-Aware Integration:

    Develop a machine learning solution to predict job applicants' income range based on their demographic and professional information. The solution should include a user-friendly interface, either as a website or SDK. Address potential biases to ensure fair and unbiased predictions. Conduct exploratory data analysis to understand the data, identify biases, and analyze variable relationships. Mitigate bias during model training and evaluation.

    Dataset info: This dataset related to income ranges of adults. The uses of this dataset are in cases where companies may need to use an algorithm to classify job applicants into income ranges. In this data set the income column- (>50k or <=50k) are the y-labels for prediction. (Note that the data is raw and rife with issues- just as it is in real time- try your best to work through it!)

  15. g

    Work Progress Program (WPP) for NYCHA Residents - Local Law 163 | gimi9.com

    • gimi9.com
    Updated Mar 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Work Progress Program (WPP) for NYCHA Residents - Local Law 163 | gimi9.com [Dataset]. https://gimi9.com/dataset/ny_ftyx-fhnc
    Explore at:
    Dataset updated
    Mar 2, 2023
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This datasets contains information about NYCHA residents who were employed by the Work Progress Program (WPP). WPP is a subsidized wage program designed to complement existing youth services by providing participating low-income young adults with work experience. Through WPP, HRA reimburses providers for wages paid to low-income young adults (aged 16-24) who have been placed in short-term jobs that typically last 12 weeks, with a special emphasis on serving opportunity or at-risk youth. WPP is not a stand-alone program. WPP provides wage reimbursements for nonprofits to provide subsidized jobs to their existing program participants. WPP supports providers that prioritize recruitment of NYCHA youth. NYCHA residency is self-reported in each providers intake and enrollment processes. Providers then report to Human Resources Administration (HRA) if someone is a NYC MAP resident or non-NYCHA MAP resident. The NYCHA MAP initiative includes 15 developments that were identified under the Mayoral Action Plan for Neighborhood Safety. For datasets related to other services provided to NYCHA residents, view the data collection “Services available to NYCHA Residents - Local Law 163”.

  16. Skills Bootcamps for Londoners - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Dec 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2024). Skills Bootcamps for Londoners - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/skills-bootcamps-for-londoners
    Explore at:
    Dataset updated
    Dec 22, 2024
    Dataset provided by
    CKANhttps://ckan.org/
    Area covered
    London
    Description

    Skills Bootcamps for Londoners aim to help Londoners aged 19+ to enter employment, upskill or change career and are open to adults who are full-time or part-time employed, self-employed or unemployed, as well as adults returning to work after a break. Bootcamp training courses provide access to in-demand sector specific skills training and provide a guaranteed job interview on completion. In addition to technical training, learners will also receive guidance on entering professional working environments to fully prepare them for new roles. More information on the programme can be found here. The Skills Bootcamp for Londoners data is a summary of provider-reported Skills Bootcamps starts, completions and outcomes from courses funded by the Greater London Authority. Wave 3 data includes Bootcamps started between April 2022 and March 2023. Completions and outcomes can occur and be reported in the 2022-23 financial year and in a defined period after that year. Wave 3 was the first wave of Skills Bootcamps that were delegated to the Greater London Authority.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Evan Reid (2019). Current Population Survey August 2016 (Adult) [Dataset]. https://www.kaggle.com/evanreid77/current-population-survey-august-2016-adult
Organization logo

Current Population Survey August 2016 (Adult)

Updated version of the CPS August 2016 dataset to resemble the UCI Adult dataset

Explore at:
zip(439643 bytes)Available download formats
Dataset updated
May 9, 2019
Authors
Evan Reid
Description

Find the column descriptions and valid entries here: https://github.com/EvanReid88/Current-Population-Survey-Data-Science-Project

The Current Population Survey is one of the oldest, largest, and most recognized surveys in the United States. This survey provides information about individuals in society such as work, earnings, and our education. The CPS is used to collect data for a variety of other studies that keep the nation informed of the economic and social well-being of its people. The August 2016 CPS dataset can be found in a raw .CSV format on Kaggle.com via the following link:

https://www.kaggle.com/census/current-population-survey Kaggle does not provide an accurate data description for the data (but somebody found the correct dict in the comments), so find the real data dict for the original dataset here

My goal was to create a dataset that is much like the UCI Adult dataset (from 1995):

http://mlr.cs.umass.edu/ml/datasets/Adult

The 2016 CPS dataset has many columns and many incomplete survey answers, so I edited the to capture the most relevant data. It is important to note that I have stripped the data of any multiple-job holding individuals, so the data only reflects individuals with a single occupation. This updated version of the 2016 CPS dataset is very similar to the UCI adult dataset with some added columns.

The Current Population Survey (CPS) is administered, processed, researched and disseminated by the U.S. Census Bureau on behalf of the Bureau of Labor Statistics (BLS).

Search
Clear search
Close search
Google apps
Main menu