5 datasets found
  1. Finding Donors

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giraldo Stevanus Nainggolan (2025). Finding Donors [Dataset]. https://www.kaggle.com/giraldosn/finding-donors
    Explore at:
    zip(447160 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    Giraldo Stevanus Nainggolan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is a modified version of the UCI Census Income dataset, originally published in the paper "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid" by Ron Kohavi. It consists of approximately 32,000 data points, each with 13 features, aimed at predicting whether an individual's income exceeds $50,000 per year.

    Features The dataset includes the following features:

    Demographics: - age: Age of the individual. - sex: Gender (Male, Female). - race: Race (e.g., White, Black, Asian-Pac-Islander, etc.). - native-country: Country of origin.

    Education: - education_level: Level of education (e.g., Bachelors, HS-grad, Masters). - education-num: Number of years of education completed.

    Employment: - workclass: Employment type (e.g., Private, Self-emp, State-gov). - occupation: Job role (e.g., Sales, Tech-support, Exec-managerial). - hours-per-week: Average working hours per week.

    Economic: - capital-gain: Capital gains recorded. - capital-loss: Capital losses recorded.

    Social: - marital-status: Marital status (e.g., Married, Never-married, Divorced). - relationship: Relationship status (e.g., Husband, Wife, Not-in-family).

    Target Variable income: A binary variable indicating whether the individual's income is <=50K or >50K.

    Source The original dataset can be found on the UCI Machine Learning Repository.

    Applications This dataset is widely used in machine learning for classification tasks, especially in supervised learning. It is an excellent resource for exploring algorithms such as decision trees, logistic regression, and support vector machines.

    Licensing The dataset is public and free to use for educational purposes.

  2. h

    adult

    • huggingface.co
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia (2023). adult [Dataset]. https://huggingface.co/datasets/mstz/adult
    Explore at:
    Dataset updated
    Nov 2, 2023
    Authors
    Mattia
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Adult

    The Adult dataset from the UCI ML repository. Census dataset including personal characteristic of a person, and their income threshold.

      Configurations and tasks
    

    Configuration Task Description

    income Binary classification Classify the person's income as over or under the threshold.

    income-no race Binary classification As income, but the race feature is removed.

    race Multiclass classification Predict the race of the individual.

      Usage… See the full description on the dataset page: https://huggingface.co/datasets/mstz/adult.
    
  3. Census Income Data Set

    • kaggle.com
    zip
    Updated Dec 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Ivamoto (2019). Census Income Data Set [Dataset]. https://www.kaggle.com/vivamoto/us-adult-income-update
    Explore at:
    zip(1334930 bytes)Available download formats
    Dataset updated
    Dec 18, 2019
    Authors
    Victor Ivamoto
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Context

    This data set come from UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/census+income

    Data Set Information:

    Prediction task is to determine whether a person makes over 50K a year from the analysis of 13 predictors.

    Content

    age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

    Acknowledgements

    Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))

    Description of fnlwgt (final weight)

    The weights on the CPS files are controlled to independent estimates of the civilian non-institutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls.

    These are:

    1. A single cell estimate of the population 16+ for each state.
    2. Controls for Hispanic Origin by age and sex.
    3. Controls by Race, age and sex.

    We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used.

    The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population.

    People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.

    Summary

    Data Set Characteristics: Multivariate Area: Social Attribute Characteristics: Categorical, Integer Number of Attributes: 14 Date Donated: 1996-05-01 Associated Tasks: Classification Missing Values? Yes

  4. n

    Special Survey of Orange County 2004

    • data-staging.niaid.nih.gov
    • dataverse.harvard.edu
    • +2more
    zip
    Updated Oct 31, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Baldassare (2014). Special Survey of Orange County 2004 [Dataset]. http://doi.org/10.7280/D1MW2M
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2014
    Authors
    Mark Baldassare
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Orange County, California
    Description

    This survey of 1,008 adult residents includes questions from earlier Orange County Annual Surveys. It also includes key indicators from the PPIC Statewide Survey for comparisons with the state and regions of California. It also considers racial/ethnic, income, and political differences. The following issues are explored in this Orange County Survey: Orange County Issues, Housing Issues, and State and National Issues. Orange County Issues include such questions as: What are the trends over time in consumer confidence and the public's ratings of the quality of life and the economy in Orange County? Do residents recall the Orange County government bankruptcy in 1994, how do they perceive its impacts today, and have attitudes toward the county government recovered in the past 10 years? How satisfied are residents with their local public services and city governments? What are the most important issues facing the county and how do residents rate the problems in their regions? What are their perceptions of commuting and transportation plans and preferences for local transportation taxes? Housing Issues include such questions as: How satisfied are residents with their homes and neighborhoods and how do they perceive their opportunities for buying a home in Orange County? How many residents feel the financial strain of housing costs, perceive the benefits of rising home values, or are seriously considering moving? What housing and neighborhood options are they willing to consider?Online data analysis & additional documentation in Link below. Methods The Orange County Survey a collaborative effort of the Public Policy Institute of California and the School of Social Ecology at the University of California, Irvine is a special edition of the PPIC Statewide Survey. This is the fourth in an annual series of PPIC surveys of Orange County. Mark Baldassare, director of the PPIC Statewide Survey, is the founder and director of the Orange County Annual Survey at UCI and a former UCI professor. The UCI survey was conducted 19 times from 1982 to 2000; thus, the Orange County Survey collaboration between PPIC and UCI that began in 2001 is an extension of earlier survey efforts. The special survey of Orange County is co-sponsored by UCI with local support received for this four-year series from Deloitte and Touche, Pacific Life Foundation, Disneyland, Los Angeles Times, Orange County Business Council, Orange County Division of League of California Cities, Orange County Register, The Irvine Company, and United Way of Orange County.Orange County is the second most populous county in the state and one of California's fastest growing and changing regions. The county is home to three million residents today, having gained approximately one million residents since 1980. Three in four residents were white and non-Hispanic in 1980; today, nearly half are Latinos and Asians, and more population growth and racial/ethnic change are projected for the next several decades. The county's dynamic economy has become one of the leaders in the high-technology industry. The county is a bellwether county in state and national politics and the site of many important local governance issues, including a county government bankruptcy that occurred 10 years ago in December 1994. There are also housing, transportation, land use, and environmental concerns related to development. Public opinion findings are critical to informing discussions and resolving public debates on key issues. The purpose of this study is to inform policymakers, the media, and the general public by providing timely, accurate, and objective information about policy preferences and economic, social, and political trends.To measure changes over time, this survey of 1,008 adult residents includes questions from earlier Orange County Annual Surveys. It also includes key indicators from the PPIC Statewide Survey for comparisons with the state and regions of California. We also consider racial/ethnic, income, and political differences. The following issues are explored in this Orange County Survey:Orange County Issues What are the trends over time in consumer confidence and the public's ratings of the quality of life and the economy in Orange County? Do residents recall the Orange County government bankruptcy in 1994, how do they perceive its impacts today, and have attitudes toward the county government recovered in the past 10 years? How satisfied are residents with their local public services and city governments? What are the most important issues facing the county and how do residents rate the problems in their regions? What are their perceptions of commuting and transportation plans and preferences for local transportation taxes?Housing Issues How satisfied are residents with their homes and neighborhoods and how do they perceive their opportunities for buying a home in Orange County? How many residents feel the financial strain of housing costs, perceive the benefits of rising home values, or are seriously considering moving? What housing and neighborhood options are they willing to consider?State and National Issues What is the overall outlook for California and U.S. conditions? How do residents rate the job performances of Governor Arnold Schwarzenegger and President George W. Bush? What are their perceptions of the national election and the second term of the Bush presidency? Has the partisan divide in trust in the federal government increased over time?

  5. US Adult Income

    • kaggle.com
    zip
    Updated Jul 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Olafenwa (2017). US Adult Income [Dataset]. https://www.kaggle.com/forums/f/4741/us-adult-income
    Explore at:
    zip(719385 bytes)Available download formats
    Dataset updated
    Jul 14, 2017
    Authors
    John Olafenwa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    US Adult Census data relating income to social factors such as Age, Education, race etc.

    The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more. Each row is labelled as either having a salary greater than ">50K" or "<=50K".

    This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

    The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

    Note that the dataset is made up of categorical and continuous features. It also contains missing values The categorical columns are: workclass, education, marital_status, occupation, relationship, race, gender, native_country

    The continuous columns are: age, education_num, capital_gain, capital_loss, hours_per_week

    This Dataset was obtained from the UCI repository, it can be found on

    https://archive.ics.uci.edu/ml/datasets/census+income, http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

    USAGE This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

    Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

    Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Giraldo Stevanus Nainggolan (2025). Finding Donors [Dataset]. https://www.kaggle.com/giraldosn/finding-donors
Organization logo

Finding Donors

Predicting Income Levels: Insights from Census Data

Explore at:
zip(447160 bytes)Available download formats
Dataset updated
Jan 14, 2025
Authors
Giraldo Stevanus Nainggolan
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset is a modified version of the UCI Census Income dataset, originally published in the paper "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid" by Ron Kohavi. It consists of approximately 32,000 data points, each with 13 features, aimed at predicting whether an individual's income exceeds $50,000 per year.

Features The dataset includes the following features:

Demographics: - age: Age of the individual. - sex: Gender (Male, Female). - race: Race (e.g., White, Black, Asian-Pac-Islander, etc.). - native-country: Country of origin.

Education: - education_level: Level of education (e.g., Bachelors, HS-grad, Masters). - education-num: Number of years of education completed.

Employment: - workclass: Employment type (e.g., Private, Self-emp, State-gov). - occupation: Job role (e.g., Sales, Tech-support, Exec-managerial). - hours-per-week: Average working hours per week.

Economic: - capital-gain: Capital gains recorded. - capital-loss: Capital losses recorded.

Social: - marital-status: Marital status (e.g., Married, Never-married, Divorced). - relationship: Relationship status (e.g., Husband, Wife, Not-in-family).

Target Variable income: A binary variable indicating whether the individual's income is <=50K or >50K.

Source The original dataset can be found on the UCI Machine Learning Repository.

Applications This dataset is widely used in machine learning for classification tasks, especially in supervised learning. It is an excellent resource for exploring algorithms such as decision trees, logistic regression, and support vector machines.

Licensing The dataset is public and free to use for educational purposes.

Search
Clear search
Close search
Google apps
Main menu