15 datasets found
  1. o

    internet_usage

    • openml.org
    Updated Sep 26, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graphics; Visualization; & Usability Center; College of Computing; Geogia Institute of Technology (2014). internet_usage [Dataset]. https://www.openml.org/search?type=data&status=active&id=372
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2014
    Authors
    Graphics; Visualization; & Usability Center; College of Computing; Geogia Institute of Technology
    Description

    Author:
    Source: Unknown - Date unknown
    Please cite:

    Internet Usage Data

    Data Type

    multivariate
    

    Abstract

    This data contains general demographic information on internet users
    in 1997.
    

    Sources

     Original Owner
    

    [1]Graphics, Visualization, & Usability Center College of Computing Geogia Institute of Technology Atlanta, GA

     Donor
    

    [2]Dr Di Cook Department of Statistics Iowa State University

    Date Donated: June 30, 1999
    

    Data Characteristics

    This data comes from a survey conducted by the Graphics and
    Visualization Unit at Georgia Tech October 10 to November 16, 1997.
    The full details of the survey are available [3]here.
    
    The particular subset of the survey provided here is the "general
    demographics" of internet users. The data have been recoded as
    entirely numeric, with an index to the codes described in the "Coding"
    file.
    
    The full survey is available from the web site above, along with
    summaries, tables and graphs of their analyses. In addition there is
    information on other parts of the survey, including technology
    demographics and web commerce.
    

    Data Format

    The data is stored in an ASCII files with one observation per line.
    Spaces separate fields.
    

    Past Usage

    This data was used in the American Statistical Association Statistical
    Graphics and Computing Sections 1999 Data Exposition.
     _
    
    
     [4]The UCI KDD Archive
     [5]Information and Computer Science
     [6]University of California, Irvine
     Irvine, CA 92697-3425
    
    Last modified: June 30, 1999
    

    References

    1. http://www.gvu.gatech.edu/gvu/user_surveys/survey-1997-10/
    2. http://www.public.iastate.edu/~dicook/
    3. http://www.cc.gatech.edu/gvu/user_surveys/survey-1997-10/
    4. http://kdd.ics.uci.edu/
    5. http://www.ics.uci.edu/
    6. http://www.uci.edu/
    

    Information about the dataset CLASSTYPE: nominal CLASSINDEX: none specific

  2. A

    ‘Cervical Cancer vs Demographic, Habits, MedHistory’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Cervical Cancer vs Demographic, Habits, MedHistory’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-cervical-cancer-vs-demographic-habits-medhistory-1a0b/latest
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Cervical Cancer vs Demographic, Habits, MedHistory’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/cervical-cancer-risk-factorse on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Source:

    Kelwin Fernandes (kafc at inesctec dot pt) - INESC TEC & FEUP, Porto, Portugal. Jaime S. Cardoso - INESC TEC & FEUP, Porto, Portugal. Jessica Fernandes - Universidad Central de Venezuela, Caracas, Venezuela.

    Data Set Information:

    The dataset was collected at 'Hospital Universitario de Caracas' in Caracas, Venezuela. The dataset comprises demographic information, habits, and historic medical records of 858 patients. Several patients decided not to answer some of the questions because of privacy concerns (missing values).

    Attribute Information:

    (int) Age(int) Number of sexual partners(int) First sexual intercourse (age)(int) Num of pregnancies(bool) Smokes(bool) Smokes (years)(bool) Smokes (packs/year)(bool) Hormonal Contraceptives(int) Hormonal Contraceptives (years)(bool) IUD(int) IUD (years)(bool) STDs(int) STDs (number)(bool) STDs:condylomatosis(bool) STDs:cervical condylomatosis(bool) STDs:vaginal condylomatosis(bool) STDs:vulvo-perineal condylomatosis(bool) STDs:syphilis(bool) STDs:pelvic inflammatory disease(bool) STDs:genital herpes(bool) STDs:molluscum contagiosum(bool) STDs:AIDS(bool) STDs:HIV(bool) STDs:Hepatitis B(bool) STDs:HPV(int) STDs: Number of diagnosis(int) STDs: Time since first diagnosis(int) STDs: Time since last diagnosis(bool) Dx:Cancer(bool) Dx:CIN(bool) Dx:HPV(bool) Dx(bool) Hinselmann: target variable(bool) Schiller: target variable(bool) Cytology: target variable(bool) Biopsy: target variable

    Relevant Papers:

    Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017.

    Citation Request:

    Kelwin Fernandes, Jaime S. Cardoso, and Jessica Fernandes. 'Transfer Learning with Partial Observability Applied to Cervical Cancer Screening.' Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2017.

    Source: http://archive.ics.uci.edu/ml/datasets/Cervical+cancer+(Risk+Factors)

    This dataset was created by UCI and contains around 900 samples along with St Ds:vulvo Perineal Condylomatosis, St Ds:pelvic Inflammatory Disease, technical information and other features such as: - St Ds (number) - Smokes - and more.

    How to use this dataset

    • Analyze Citology in relation to Biopsy
    • Study the influence of St Ds: Time Since First Diagnosis on Age
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit UCI

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  3. Adult Census Income Dataset

    • kaggle.com
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyam Choksi (2024). Adult Census Income Dataset [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/adult-census-income-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Priyam Choksi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The following was retrieved from UCI machine learning repository.

    This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year.

    Description of fnlwgt (final weight)

    The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are:

    A single cell estimate of the population 16+ for each state. Controls for Hispanic Origin by age and sex. Controls by Race, age and sex. We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.

  4. UCI Communities and Crime Unnormalized Data Set

    • kaggle.com
    Updated Feb 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kavitha (2018). UCI Communities and Crime Unnormalized Data Set [Dataset]. https://www.kaggle.com/datasets/kkanda/communities%20and%20crime%20unnormalized%20data%20set/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kavitha
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Introduction: The dataset used for this experiment is real and authentic. The dataset is acquired from UCI machine learning repository website [13]. The title of the dataset is ‘Crime and Communities’. It is prepared using real data from socio-economic data from 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crimedata from the 1995 FBI UCR [13]. This dataset contains a total number of 147 attributes and 2216 instances.

    The per capita crimes variables were calculated using population values included in the 1995 FBI data (which differ from the 1990 Census values).

    Content

    The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The crime attributes (N=18) that could be predicted are the 8 crimes considered 'Index Crimes' by the FBI)(Murders, Rape, Robbery, .... ), per capita (actually per 100,000 population) versions of each, and Per Capita Violent Crimes and Per Capita Nonviolent Crimes)

    predictive variables : 125 non-predictive variables : 4 potential goal/response variables : 18

    Acknowledgements

    http://archive.ics.uci.edu/ml/datasets/Communities%20and%20Crime%20Unnormalized

    U. S. Department of Commerce, Bureau of the Census, Census Of Population And Housing 1990 United States: Summary Tape File 1a & 3a (Computer Files),

    U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)

    U.S. Department of Justice, Bureau of Justice Statistics, Law Enforcement Management And Administrative Statistics (Computer File) U.S. Department Of Commerce, Bureau Of The Census Producer, Washington, DC and Inter-university Consortium for Political and Social Research Ann Arbor, Michigan. (1992)

    U.S. Department of Justice, Federal Bureau of Investigation, Crime in the United States (Computer File) (1995)

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    Data available in the dataset may not act as a complete source of information for identifying factors that contribute to more violent and non-violent crimes as many relevant factors may still be missing.

    However, I would like to try and answer the following questions answered.

    1. Analyze if number of vacant and occupied houses and the period of time the houses were vacant had contributed to any significant change in violent and non-violent crime rates in communities

    2. How has unemployment changed crime rate(violent and non-violent) in the communities?

    3. Were people from a particular age group more vulnerable to crime?

    4. Does ethnicity play a role in crime rate?

    5. Has education played a role in bringing down the crime rate?

  5. f

    Subject Demographics and Neuropathological Assessments.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex E. Roher; David H. Cribbs; Ronald C. Kim; Chera L. Maarouf; Charisse M. Whiteside; Tyler A. Kokjohn; Ian D. Daugs; Elizabeth Head; Carolyn Liebsack; Geidy Serrano; Christine Belden; Marwan N. Sabbagh; Thomas G. Beach (2023). Subject Demographics and Neuropathological Assessments. [Dataset]. http://doi.org/10.1371/journal.pone.0059735.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alex E. Roher; David H. Cribbs; Ronald C. Kim; Chera L. Maarouf; Charisse M. Whiteside; Tyler A. Kokjohn; Ian D. Daugs; Elizabeth Head; Carolyn Liebsack; Geidy Serrano; Christine Belden; Marwan N. Sabbagh; Thomas G. Beach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The total plaque score has a maximum of 15. The total WMR score has a maximum of 12. Some neuropathological assessments are not available due to the use of different classification protocols between the 2 institutions involved in the study.Abbreviations: NDC, non-demented control; NI-AD, non-immunized Alzheimer’s disease; BSHRI, Banner Sun Health Research Institute; Bapi-AD, bapineuzumab immunized Alzheimer’s disease; UCI, University of California, Irvine; yrs, years; F, female; M, male; PMI, postmortem interval; g, grams, MMSE, mini mental state examination; APOE, apolipoprotein E; GT, genotype; CAA, cerebral amyloid angiopathy; Occ., occipital; pariet., parietal; Mod., moderate; WMR, white matter rarefaction; N/A, not available.

  6. h

    optimized_adult_census

    • huggingface.co
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mada (2025). optimized_adult_census [Dataset]. https://huggingface.co/datasets/Databoost/optimized_adult_census
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2025
    Authors
    Mada
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Documentation of the Dataset - Adult Census Income Dataset Optimized

      1. General Description of the Dataset
    

    This dataset, called Adult Census Income Dataset Optimized, is an optimized version of the Adult Census Income Dataset. The latter comes from the UCI Machine Learning Repository and is commonly used in classification tasks to predict whether a person earns more or less than $50,000 per year based on various demographic characteristics. We optimized the dataset by… See the full description on the dataset page: https://huggingface.co/datasets/Databoost/optimized_adult_census.

  7. n

    Annual Survey of Orange County 2000

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Baldassare (2014). Annual Survey of Orange County 2000 [Dataset]. http://doi.org/10.7280/D15P48
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2014
    Authors
    Mark Baldassare
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    California, Orange County
    Description

    This 19th Orange County Annual Survey, UCI, continues to monitor social, economic and political trends. The Orange County Consumer Confidence Index now stands at 112, the highest score since the study began tracking this five-question measure in 1986, surpassing the U.S. index, which is at 109. The 2000 survey was conducted May 3-14, 2000, and includes random telephone interviews with 1,005 Orange County adults in English and Spanish.Online data analysis & additional documentation in Link below. Methods The 2000 Orange County Annual Survey was directed by Mark Baldassare, professor and Johnson Chair in Civic Governance at UCI, and Senior Fellow at the Public Policy Institute of California. Cheryl Katz, research associate, was co-director. The random telephone survey included interviews with 1,005 Orange County adult residents conducted May 3-14, 2000. We follow the methods used in the 18 previous surveys.Interviewing was conducted on weekend days and weekday nights, using a computer-generated random sample of telephone numbers. Within a household, adult respondents were randomly chosen for interview. Each interview and took an average of 20 minutes to complete. The interviewing was conducted in English and Spanish as needed. The completion rate was 67%. Telephone interviewing was conducted by Interviewing Services of America in Van Nuys, CA. The sample's demographic characteristics were comparable to data from the U.S. Census, California Department of Finance, and previous Orange County Annual Surveys.The sampling error for this survey is +/3% at the 95% confidence level. This means that 95 times out of 100, the results will be within 3 percentage points of what they would be if all adults in Orange County were interviewed. The sampling error for any subgroup would be larger. Sampling error is just one type of error to which surveys are subject. Results may also be affected by factors such as question wording, ordering, and survey timing.Throughout the report, we refer to two geographic regions. North refers to cities and communities north of the 55 Freeway, including Anaheim, Orange, Villa Park, La Habra, Brea, Buena Park, Fullerton, Placentia, Yorba Linda, La Palma, Cypress, Los Alamitos, Rossmoor, Seal Beach, Westminster, Midway City, Stanton, Fountain Valley, Huntington Beach, Santa Ana, Garden Grove, Tustin, Tustin Foothills and Costa Mesa. South refers to cities and communities south of the 55 Freeway, including Newport Beach, Irvine, Lake Forest, Newport Coast, Aliso Viejo, Laguna Hills, Laguna Niguel, Laguna Woods, Mission Viejo, Portola Hills, Rancho Santa Margarita, Foothill Ranch, Coto de Caza, Trabuco, Laguna Beach, Dana Point, San Clemente, Capistrano Beach and San Juan Capistrano. In the analysis of questions on the proposed El Toro airport, we include Newport Beach in the North County.Some of the questions in this survey are repeated from national surveys conducted by the University of Michigan in 2000, the Pew Research Center in 1999, the Wall Street Journal and NBC News in 1999, CBS News in 1999, Fox News in 2000, and the Gallup Organization in 1999. Questions with California comparisons are repeated from the Public Policy Institute of California's Statewide Surveys in 2000, directed by Mark Baldassare.

  8. Determination of the ranges of different surface temperature intervals.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiangli Wu; Lin Zhang; Shuying Zang (2023). Determination of the ranges of different surface temperature intervals. [Dataset]. http://doi.org/10.1371/journal.pone.0217850.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiangli Wu; Lin Zhang; Shuying Zang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Determination of the ranges of different surface temperature intervals.

  9. n

    Annual Survey of Orange County 1990

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Baldassare (2014). Annual Survey of Orange County 1990 [Dataset]. http://doi.org/10.7280/D1G59S
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2014
    Authors
    Mark Baldassare
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Orange County
    Description

    The theme of this year's survey is "Issues of the 1990s." The 1990 Orange County Annual Survey explores three interrelated issues that are central topics in discussions about the county's future. These issues are the environment, regional government, and transportation alternatives. The sample size is 1,017 Orange County adult residents. Online data analysis & additional documentation in Link below. Methods The Orange County Annual Survey was co-directed by Mark Baldassare, a UC Irvine professor of social ecology and Cheryl Katz, research associate. For the survey, 1,017 adult Orange County residents were interviewed by telephone Sept. 5 to 21. In Orange County, where more than 97 percent of households have telephones, this method of interview gives highly representative data.Interviewing was conducted on weekend days and weekday nights, using a random sample of 4,000 listed and unlisted telephone numbers. These were generated by computer from a list of working blocks of telephone exchanges. The telephone sample was generated by Pijacki and Associates of Shoreham, N.Y. The field work was conducted at the Center for Survey Research by UCI's Public Policy Research Organization.Of the telephone numbers called, 25 percent resulted in completed interviews and 15 percent were refusals. The completion rate for the survey (completions divided by completions plus refusals) was 63 percent, consistent with earlier annual surveys.Other telephone outcomes include the following: 21 percent disconnected numbers; 3 percent computer or fax lines; 15 percent businesses and other non-Orange County households; 20 percent persistent no answers and 1 percent persistently unavailable respondents. Two percent were not completed because of language problems, including non-English speaking households and hearing impairment. These figures are also consistent with earlier annual surveys.Within a household, respondents were chosen for interview using the Troldahl-Carter method. This method randomly selects a household member from a grid that includes information on the number of adult household members and the number of adult men in the household. Up to six callbacks were attempted per telephone number.Each interview included 95 questions and took an average of 20 minutes to complete. Most interviews ranged in length from 15 to 25 minutes. The surveys were designed in three stages over eight months. The first stage involved feedback on survey topics and questions from the annual survey's Steering Committee, Advisory Committee and UCI colleagues. In the second stage, during the spring, UCI graduate students conducted focus group interviews on Orange County topics and pretested survey questions. After a draft was reviewed by the advisory committee, final revisions of the survey questions were made.The interview began with questions about housing, consumer confidence and perceptions of life in Orange County. These were followed by questions on the environment, regional governance and transportation alternatives. A major section of the interview was devoted to questions about mass transit. Later in the interview, we turned to other topics, including privacy, civic participation and charitable giving. The conclusion of the survey was devoted to questions about work, commuting, demographic characteristics and political attitudes.The survey's validity was checked by comparing the sample characteristics to Orange County population data. We compared the 1990 survey results to previous annual surveys and other recent survey data. Age, income and other demographic features of our sample were comparable with those noted in other studies.For the purposes of analysis, we statistically weighted the sample to represent the actual regional distribution of Orange County residents. The population estimates for north, west, central and south county regions were from data issued by the Demographic Research Unit, County of Orange. The 1990 U.S. Census preliminary population estimates by city were also reviewed.Several other efforts were made to correct for possible errors in interviewing or data processing. Approximately 10 percent of the completed interviews were verified through callbacks. All questionnaires were checked by a supervisor immediately after completion. Finally, keypunched data were verified for all respondents in the survey sample.The sampling error for this survey is +/3 percent at the 95 percent confidence level. This means that 95 times out of 100, the results will be within 3 percentage points of what they would be if all adults in Orange County were interviewed. The sampling error for any subgroup would be larger.Sampling error is just one type of error to which surveys are subject. Results may also be influenced by factors such as question wording, survey timing and other aspects of survey design.

  10. Default of Credit Card Clients Dataset

    • kaggle.com
    Updated Nov 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2016). Default of Credit Card Clients Dataset [Dataset]. https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2016
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    UCI Machine Learning
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Information

    This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.

    Content

    There are 25 variables:

    • ID: ID of each client
    • LIMIT_BAL: Amount of given credit in NT dollars (includes individual and family/supplementary credit
    • SEX: Gender (1=male, 2=female)
    • EDUCATION: (1=graduate school, 2=university, 3=high school, 4=others, 5=unknown, 6=unknown)
    • MARRIAGE: Marital status (1=married, 2=single, 3=others)
    • AGE: Age in years
    • PAY_0: Repayment status in September, 2005 (-1=pay duly, 1=payment delay for one month, 2=payment delay for two months, ... 8=payment delay for eight months, 9=payment delay for nine months and above)
    • PAY_2: Repayment status in August, 2005 (scale same as above)
    • PAY_3: Repayment status in July, 2005 (scale same as above)
    • PAY_4: Repayment status in June, 2005 (scale same as above)
    • PAY_5: Repayment status in May, 2005 (scale same as above)
    • PAY_6: Repayment status in April, 2005 (scale same as above)
    • BILL_AMT1: Amount of bill statement in September, 2005 (NT dollar)
    • BILL_AMT2: Amount of bill statement in August, 2005 (NT dollar)
    • BILL_AMT3: Amount of bill statement in July, 2005 (NT dollar)
    • BILL_AMT4: Amount of bill statement in June, 2005 (NT dollar)
    • BILL_AMT5: Amount of bill statement in May, 2005 (NT dollar)
    • BILL_AMT6: Amount of bill statement in April, 2005 (NT dollar)
    • PAY_AMT1: Amount of previous payment in September, 2005 (NT dollar)
    • PAY_AMT2: Amount of previous payment in August, 2005 (NT dollar)
    • PAY_AMT3: Amount of previous payment in July, 2005 (NT dollar)
    • PAY_AMT4: Amount of previous payment in June, 2005 (NT dollar)
    • PAY_AMT5: Amount of previous payment in May, 2005 (NT dollar)
    • PAY_AMT6: Amount of previous payment in April, 2005 (NT dollar)
    • default.payment.next.month: Default payment (1=yes, 0=no)

    Inspiration

    Some ideas for exploration:

    1. How does the probability of default payment vary by categories of different demographic variables?
    2. Which variables are the strongest predictors of default payment?

    Acknowledgements

    Any publications based on this dataset should acknowledge the following:

    Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

    The original dataset can be found here at the UCI Machine Learning Repository.

  11. f

    Patient characteristics, presented for the total population and separately...

    • plos.figshare.com
    xls
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marlieke de Fouw; Melissa W. M. Boere; Carolyn Nakisige; Mariam Nabwire; Jane Namugga; Israel Luutu; Jackson Orem; Jan M. M. van Lith; Jogchum J. Beltman (2025). Patient characteristics, presented for the total population and separately for patients with early and advanced stage disease classified with the FIGO 2009 staging system. [Dataset]. http://doi.org/10.1371/journal.pone.0316323.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Marlieke de Fouw; Melissa W. M. Boere; Carolyn Nakisige; Mariam Nabwire; Jane Namugga; Israel Luutu; Jackson Orem; Jan M. M. van Lith; Jogchum J. Beltman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The proportions are calculated over the total number of patients for which the variable is recorded, the proportion of missing values is calculated over the total number of patients.

  12. f

    Summary of the publicly-available UCI Machine Learning Repository datasets...

    • plos.figshare.com
    bin
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esha Datta; Aditya Ballal; Javier E. López; Leighton T. Izu (2023). Summary of the publicly-available UCI Machine Learning Repository datasets used for method comparison. [Dataset]. http://doi.org/10.1371/journal.pdig.0000307.t002
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    PLOS Digital Health
    Authors
    Esha Datta; Aditya Ballal; Javier E. López; Leighton T. Izu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of the publicly-available UCI Machine Learning Repository datasets used for method comparison.

  13. Surface temperature statistics in 2016.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiangli Wu; Lin Zhang; Shuying Zang (2023). Surface temperature statistics in 2016. [Dataset]. http://doi.org/10.1371/journal.pone.0217850.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiangli Wu; Lin Zhang; Shuying Zang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Surface temperature statistics in 2016.

  14. n

    Annual Survey of Orange County 1989

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Oct 31, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Baldassare (2014). Annual Survey of Orange County 1989 [Dataset]. http://doi.org/10.7280/D1KW29
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2014
    Authors
    Mark Baldassare
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Orange County
    Description

    The theme of this year's survey is "Orange County: Approaching the 1990s. The survey is designed to provide an extensive analysis of issues affecting the next decade. The sample size is 1,085 Orange County adult residents.Online data analysis & additional documentation in Link below. Methods The Orange County Annual survey was directed by Mark Baldassare, a professor of social ecology at UC Irvine. For the survey, 1,085 adult orange County residents were interviewed by telephone Sept. 6 to 23. In Orange County, where more than 97 percent of households have telephones, this method of interview gives highly representative data.Interviewing was conducted on weekend days and weekday nights, using a random sample of 4,500 listed and unlisted telephone numbers. These were generated by computer from a list of working blocks of telephone exchanges. The telephone sample was generated by Pijacki and Associates of Shoreham, N.Y. The field work was conducted at the Center for Survey Research by UCI's Public Policy Research Organization.Of the telephone numbers called, 23 percent resulted in completed interviews and 15 percent were refusals. The completion rate for the survey (completions divided bycompletions plus refusals) was 61 percent.Other telephone outcomes included the following: 21 percent disconnected numbers; 3 percent computer or fax lines; 15 percent businesses and other non-Orange County households; 20 percent persistent no answers and l percent persistently unavailable respondents. Two percent were not completed because of language problems, including non-English speaking households, and hearing impairment.Within a-household, respondents were chosen for interview using the Troldahl-Carter method. This method randomly selects a household member from a grid that includes information on the number of adult household members and the number of adult men in the household. Up to six callbacks were attempted per telephone number.Each interview included 90 questions and took an average of 20 minutes to complete. Most interviews ranged in length from 15 to 25 minutes.The surveys were designed in three stages over several months. In the first stage, UCI undergraduate students conducted face-to-face, interviews on Orange County topics with randomly selected adult residents. The second stage involved feedback on questions and topics from the annual survey's Steering Committee, Advisory Committee and colleagues. The final stage included pre-tests by students, followed by final revisions of the questions.The interview began with questions about housing, moving preferences, consumer confidence and perceptions of life in Orange County. These were followed by questions on growth, transportation and crime issues. A major section of the interview was then devoted to questions about air pollution and the Air Quality Management Plan. Later in the interview, we turned to the topics of charities. The conclusion of the survey was devoted to questions about work and commuting patterns, personal characteristics, household status and political attitudes.The survey's validity was checked by comparing the sample's characteristics to available information on Orange County's population. We compared the 1989 survey results to the 1980 U.S. Census, previous annual surveys and other recent survey data. Age, income and other demographic features of our sample were comparable with those noted in other studies.For data analyses, we statistically weighted the sample to represent the actual regional distribution of Orange County residents. The 1989 population estimates for north, west, central and south county regions were issued by the Demographic Research Unit, County of Orange.Other efforts were made to correct for possible errors in the course of interviewing and data processing. Approximately 10 percent of the completed interviews were verified through callbacks. All questionnaires were checked by the interviewer supervisor immediately after completion. Finally, keypunched data were double-checked for all cases in the survey sample.The sampling error for this survey is +/3 percent at the 95 percent confidence level. This means that 95 times out of 100, the results will-be within 3 percentage pointsof what they would be if all adults in Orange County were interviewed. The sampling error for any subgroup would be larger.Sampling error is just one type of error to which surveys are subject. Results may also influenced by factors such as question wording, survey timing and other aspects of survey design.

  15. n

    Annual Survey of Orange County 1998

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Oct 31, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Baldassare (2014). Annual Survey of Orange County 1998 [Dataset]. http://doi.org/10.7280/D1F59G
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2014
    Authors
    Mark Baldassare
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    California, Orange County
    Description

    This seventeenth Orange County Annual Survey continues to track trends over time in the county's important social, economic and political issues. This year, there is a special focus on understanding the impacts of incresing urbanization and the changing demographics of Orange County. The sample size is 1,000 Orange County adult residents. Online data analysis & additional documentation in Link below. Methods The 1998 Orange County Annual Survey was co-directed by Mark Baldassare, professor at UCI and senior fellow at the Public Policy Institute of California, and Cheryl Katz, research associate. The random telephone survey included interviews with 2,002 Orange County adult residents conducted Sept. 1-13, 1998. We follow the methods used in the 16 previous surveys, with two exceptions. This year, we doubled the sample size of the Orange County Annual Survey, which is usually about 1,000 interviews, so that we could expand our analysis of the Latino and Asian populations. We also conducted interviews in Vietnamese as well as in English and Spanish. Interviewing was conducted on weekend days and weekday nights, using a computer-generated random sample of telephone numbers. Within a household, adult respondents were randomly chosen for interview. Each interview took an average of 20 minutes to complete. The interviewing was conducted in English, Spanish or Vietnamese, as needed. The completion rate was 74 percent. The telephone interviewing was conducted by Interviewing Services of America in Van Nuys, CA. The survey sample was compared with the U.S. Census and state figures by city for Orange County, and was found to represent the actual regional distribution of Orange County residents. The sample's demographic characteristics also were closely comparable to the census and other survey data, including previous Orange County Annual Surveys. The sampling error for this survey is +/2% at the 95% confidence level. This means that 95 times out of 100, the results will be within two percentage points of what they would be if all adults in Orange County were interviewed. The sampling error for any subgroup would be larger. Sampling error is just one type of error to which surveys are subject. Results may also be affected by question wording, ordering, and survey timing. Throughout the report, we refer to two geographic regions. North County includes Anaheim, Orange, Villa Park, La Habra, Brea, Buena Park, Fullerton, Placentia, Yorba Linda, La Palma, Cypress, Los Alamitos, Rossmoor, Seal Beach, Westminster, Midway City, Stanton, Fountain Valley, Huntington Beach, Santa Ana, Garden Grove, Tustin, Tustin Foothills and Costa Mesa. South County includes Newport Beach, Irvine, Lake Forest, Aliso Viejo, Laguna Hills, Laguna Niguel, Mission Viejo, Portola Hills, Rancho Santa Margarita, Foothill Ranch, Coto de Caza, Trabuco Highlands, El Toro Station, Laguna Beach, Dana Point, San Clemente, Capistrano Beach and San Juan Capistrano. In the analysis of questions on the proposed El Toro airport, we include Newport Beach in North County.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Graphics; Visualization; & Usability Center; College of Computing; Geogia Institute of Technology (2014). internet_usage [Dataset]. https://www.openml.org/search?type=data&status=active&id=372

internet_usage

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2014
Authors
Graphics; Visualization; & Usability Center; College of Computing; Geogia Institute of Technology
Description

Author:
Source: Unknown - Date unknown
Please cite:

Internet Usage Data

Data Type

multivariate

Abstract

This data contains general demographic information on internet users
in 1997.

Sources

 Original Owner

[1]Graphics, Visualization, & Usability Center College of Computing Geogia Institute of Technology Atlanta, GA

 Donor

[2]Dr Di Cook Department of Statistics Iowa State University

Date Donated: June 30, 1999

Data Characteristics

This data comes from a survey conducted by the Graphics and
Visualization Unit at Georgia Tech October 10 to November 16, 1997.
The full details of the survey are available [3]here.

The particular subset of the survey provided here is the "general
demographics" of internet users. The data have been recoded as
entirely numeric, with an index to the codes described in the "Coding"
file.

The full survey is available from the web site above, along with
summaries, tables and graphs of their analyses. In addition there is
information on other parts of the survey, including technology
demographics and web commerce.

Data Format

The data is stored in an ASCII files with one observation per line.
Spaces separate fields.

Past Usage

This data was used in the American Statistical Association Statistical
Graphics and Computing Sections 1999 Data Exposition.
 _


 [4]The UCI KDD Archive
 [5]Information and Computer Science
 [6]University of California, Irvine
 Irvine, CA 92697-3425

Last modified: June 30, 1999

References

1. http://www.gvu.gatech.edu/gvu/user_surveys/survey-1997-10/
2. http://www.public.iastate.edu/~dicook/
3. http://www.cc.gatech.edu/gvu/user_surveys/survey-1997-10/
4. http://kdd.ics.uci.edu/
5. http://www.ics.uci.edu/
6. http://www.uci.edu/

Information about the dataset CLASSTYPE: nominal CLASSINDEX: none specific

Search
Clear search
Close search
Google apps
Main menu