100+ datasets found
  1. u

    SAPRIN Individual Demographic Dataset 2018 - South Africa

    • datafirst.uct.ac.za
    Updated Jul 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prof Steve Tollman (2020). SAPRIN Individual Demographic Dataset 2018 - South Africa [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/study/zaf-saprin-sidd-2018-v1
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Dr Kobus Herbst
    Prof Mark Collinson
    Prof Deenan Pillay
    Prof Steve Tollman
    Prof Marianne Alberts
    Time period covered
    1993 - 2017
    Area covered
    South Africa
    Description

    Abstract

    The South African Population Research Infrastructure Network (SAPRIN) is a national research infrastructure funded through the Department of Science and Technology and hosted by the South African Medical Research Council. One of SAPRIN’s initial goals has been to harmonise the legacy longitudinal data from the three current Health and Demographic Surveillance System (HDSS) Nodes. These long-standing nodes are the MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, established in 1993, with a population of 116 000 people; the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, established in 1996, with a current population of 100 000; and the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, established in 2000, with a current population of 125 000.

    SAPRIN data are processed for longitudinal analysis by organising the demographic data into residence episodes at a geographical location, and membership episodes within a household. Start events include enumeration, birth, in-migration and relocating into a household from within the study population; exit events include death (by cause), out-migration, and relocating to another location in the study population. Variables routinely updated at individual level include health care utilisation, marital status, labour status, education status, as well as recording household asset status. Anticipated outcomes of SAPRIN include: (i) regular releases of up-to-date, longitudinal data, representative of South Africa’s fast-changing poorer communities for research, interpretation and calibration of national datasets; (ii) national statistics triangulation, whereby longitudinal SAPRIN data are triangulated with National Census data for calibration of national statistics and studying the mechanisms driving the national statistics; (iii) An interdisciplinary research platform for conducting observational and interventional research at population level; (iv) policy engagement to provide evidence to underpin policy-making for cost evaluation and targeting intervention programmes, thereby improving the accuracy and efficiency of pro-poor, health and wellbeing interventions; (v) scientific education through training at related universities; and (vi) community engagement, whereby coordinated engagement with communities will enable two-way learning between researchers and community members, and enabling research site communities and service providers to have access to and make effective use of research results.

    Geographic coverage

    The Agincourt HDSS covers an area of approximately 420km2 and is located in Bushbuckridge District, Mpumalanga in the rural north-east of South Africa close to the Mozambique border. DIMAMO is located in the Capricorn district, Limpopo Province approximately 40 km from Polokwane, the capital city of Limpopo Province and 15-50 km from the University of Limpopo (Turfloop Campus). The site covers an area of approximately 200 km2. AHRI is situated in the south-east portion of the Umkhanyakude district of KwaZulu-Natal province near the town of Mtubatuba. It is bounded on the west by the Umfolozi-Hluhluwe nature reserve, on the south by the Umfolozi river, on the east by the N2 highway (except form portions where the KwaMsane township strandles the highway) and in the north by the Inyalazi river for portions of the boundary. The area is 438km2.

    Analysis unit

    Exposure episodes

    Universe

    Households resident in dwellings within the study area will be eligible for inclusion in the household component of SAPRIN. All individuals identified by the household proxy informant as a member of the household will be enumerated. A resident household member is an individual that intends to sleep the majority of time at the dwelling occupied by the household over a four-month period. Households will include resident and non-resident members. An individual is a non-resident member if they have close ties to the household, but do not physically reside with the household most of the time. They can also be called temporary migrants and they are enumerated within the household list. Because household membership is not tied to physical residency, an individual may be a member of more than one household.

    Kind of data

    Event/transaction data

    Sampling procedure

    This dataset is not based on a sample but contains information from the complete demographic surveillance areas.

  2. d

    TSS Individual Results with Comments Data Dictionary

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Government-wide Policy (2025). TSS Individual Results with Comments Data Dictionary [Dataset]. https://catalog.data.gov/dataset/tss-individual-results-with-comments-data-dictionary
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset provided by
    Office of Government-wide Policy
    Description

    A Data Dictionary for the TSS Individual Reports with Comments reports.

  3. Latest Data Professionals Salary Dataset

    • kaggle.com
    Updated Jul 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2023). Latest Data Professionals Salary Dataset [Dataset]. https://www.kaggle.com/datasets/whenamancodes/data-professionals-salary-dataset-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aman Chauhan
    Description

    About Dataset

    Context

    Analytics refers to the methodical examination and calculation of data or statistics. Its purpose is to uncover, interpret, and convey meaningful patterns found within the data. Additionally, analytics involves utilizing these data patterns to make informed decisions. It proves valuable in domains abundant with recorded information, employing a combination of statistics, computer programming, and operations research to measure performance.

    Businesses can leverage analytics to describe, predict, and enhance their overall performance. Various branches of analytics encompass predictive analytics, prescriptive analytics, enterprise decision management, descriptive analytics, cognitive analytics, Big Data Analytics, retail analytics, supply chain analytics, store assortment and stock-keeping unit optimization, marketing optimization and marketing mix modeling, web analytics, call analytics, speech analytics, sales force sizing and optimization, price and promotion modeling, predictive science, graph analytics, credit risk analysis, and fraud analytics. Due to the extensive computational requirements involved (particularly with big data), analytics algorithms and software utilize state-of-the-art methods from computer science, statistics, and mathematics.

    Data Dictionary

    ColumnsDescription
    Company NameCompany Name refers to the name of the organization or company where an individual is employed. It represents the specific entity that provides job opportunities and is associated with a particular industry or sector.
    Job TitleJob Title refers to the official designation or position held by an individual within a company or organization. It represents the specific role or responsibilities assigned to the person in their professional capacity.
    Salaries ReportedSalaries Reported indicates the information or data related to the salaries of employees within a company or industry. This data may be collected and reported through various sources, such as surveys, employee disclosures, or public records.
    LocationLocation refers to the specific geographical location or area where a company or job position is situated. It provides information about the physical location or address associated with the company's operations or the job's work environment.
    SalarySalary refers to the monetary compensation or remuneration received by an employee in exchange for their work or services. It represents the amount of money paid to an individual on a regular basis, typically in the form of wages or a fixed annual income.

    Content

    This Dataset consists of salaries for Data Scientists, Machine Learning Engineers, Data Analysts, and Data Engineers in various cities across India (2022).

    -Salary Dataset.csv -Partially Cleaned Salary Dataset.csv

    Acknowledgements

    This Dataset is created from https://www.glassdoor.co.in/. If you want to learn more, you can visit the Website.

  4. EQA30 - Individuals who experienced discrimination in social settings

    • datasalsa.com
    • data.europa.eu
    csv, json-stat, px +1
    Updated Jun 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2025). EQA30 - Individuals who experienced discrimination in social settings [Dataset]. https://datasalsa.com/dataset/?catalogue=data.gov.ie&name=eqa30-individuals-who-experienced-discrimination-in-social-settings
    Explore at:
    json-stat, xlsx, px, csvAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Central Statistics Office Irelandhttps://www.cso.ie/en/
    Authors
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 25, 2025
    Description

    EQA30 - Individuals who experienced discrimination in social settings. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Individuals who experienced discrimination in social settings...

  5. Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath

    • catalog.data.gov
    • gimi9.com
    • +2more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath [Dataset]. https://catalog.data.gov/dataset/dataset-for-targeted-gc-ms-analysis-of-firefighters-exhaled-breath
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This dataset includes a table of the VOC concentrations detected in firefighter breath samples. QQ-plots for benzene, toluene, and ethylbenzene levels in breath samples as well as box-and-whisker plots of pre-, post-, and 1 h post-exposure breath levels of VOCs for firefighters participating in attack, search, and outside ventilation positions are provided. Graphs detailing the responses of individuals to pre-, post-, and 1 h post-exposure concentrations of benzene, toluene, and ethylbenzene are shown. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. Format: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. This dataset is associated with the following publication: Wallace, A., J. Pleil, K. Oliver, D. Whitaker, S. Mentese, K. Fent, and G. Horn. Targeted GC-MS analysis of firefighters’ exhaled breath: Exploring biomarker response at the individual level. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE. Taylor & Francis, Inc., Philadelphia, PA, USA, 16(5): 355-366, (2019).

  6. Quarterly Labour Force Survey Household Dataset, April - June, 2021

    • beta.ukdataservice.ac.uk
    Updated 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, April - June, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8852-3
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    DataCitehttps://www.datacite.org/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office For National Statistics
    Description
    Background
    The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

    Household datasets
    Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

    Change to coding of missing values for household series
    From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

    LFS Documentation
    The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS
    LFS User Guidance page before commencing analysis.

    Additional data derived from the QLFS
    The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

    End User Licence and Secure Access QLFS Household datasets
    Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

    Changes to variables in QLFS Household EUL datasets
    In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

    Review of imputation methods for LFS Household data - changes to missing values
    A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

    Occupation data for 2021 and 2022 data files

    The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

    Latest edition information

    For the third edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN, SOC20M and SOC20O have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

  7. o

    Data from: Dataset: survey about research data management in agricultural...

    • openagrar.de
    Updated Oct 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Senft; Ulrike Stahl; Nikolai Svoboda (2021). Dataset: survey about research data management in agricultural sciences in Germany [Dataset]. http://doi.org/10.5073/20211013-105447
    Explore at:
    Dataset updated
    Oct 22, 2021
    Dataset provided by
    Leibniz Centre for Agricultural Landscape Research (ZALF), Müncheberg, Germany
    Leibniz Institute for Agricultural Engineering and Bioeconomy (reg. assoc.) (ATB), Potsdam, Germany
    Julius Kühn-Institute (JKI), Federal Research Centre for Cultivated Plants, Data Processing Department, Quedlinburg, Germany
    Authors
    Matthias Senft; Ulrike Stahl; Nikolai Svoboda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the result of an online survey the authors conducted in the German agricultural science community in 2020. The survey inquires not only about the status quo, but also explicitly about the wishes and needs of users, representing the agricultural scientific research domain, of the in-progress NFDI (national research data infrastructure). Questions cover information about produced and (re-)used data, data quality aspects, information about the use of standards, publication practices and legal aspects of agricultural research data, the current situation in research data management in regards to awareness, consulting and curricula as well as needs of the agricultural community in respect to future developments. In total, the questionnaire contained 52 questions and was conducted using the Community Edition of the Open Source Survey Tool LimeSurvey (Version 3.19.3; LimeSurvey GmbH). The questions were accessible in English and German. The first set of questions (Questions 1-4) addressed the respondent’s professional background (i.e. career status, affiliation and subject area, but no personal data) and the user group. The user groups included data users, data providers as well as infrastructure service and information service providers. Subsequent questions were partly user group specific. All questions, the corresponding question types and addressed user groups can be found in the questionnaire files (Survey-Questions-2020-DE.pdf German Version; Survey-Questions-2020-EN.pdf English Version). The survey was accessible online between June 26th and July 21st 2020, could be completed anonymously and took about 20 minutes. The survey was promoted in an undirected manner via mail lists of agricultural institutes and agricultural-specific professional societies in Germany, via social media (e.g. Twitter) and announced during the first community workshop of NFDI4Agri on July 15th 2020 and other scientific events. After closing the survey, we exported the data from the LimeSurvey tool and initially screened it. We considered all questionnaires that contained at least one answered question in addition to the respondent’s professional background information (Questions 1-4). In total, we received 196 questionnaires of which 160 were completed in full (although not always every answer option was used, empty cells are filled with “N/A”). The main data set contains all standardized answers from the respondents. For anonymization, respondents’ individual answers, for instance, free text answers, comments and details in the category "other” were removed from the main data set. The main data set only lists whether such information was provided (“Yes”) or not (“No” or “N/A”). In an additional file respondents’ individual answers of the questions 4-52 are listed alphabetically, so that it is not possible to trace the data back. In the rare cases where only one person has provided such individual information in an answer, it is traceable but does not contain any sensitive data. The main data set containing answers of the 196 questionnaires received can be found in the file Survey-2020-Main-DataSet-Answers.xlsx. The subsidary data set containing the respondents’ individual answers (most answers are in German and are not translated) of the questions 4-52, for instance, free text answers, comments and details in the category "other” (alphabetically listed) can be found in Survey-2020-Subsidary-DataSet-Free_Text_Answers.xlsx.

  8. n

    Cognitive Triad Dataset: Understanding Beck's Cognitive Triad Mechanism in...

    • narcis.nl
    • data.mendeley.com
    Updated Jun 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jere, S (via Mendeley Data) (2021). Cognitive Triad Dataset: Understanding Beck's Cognitive Triad Mechanism in an Individual from Social Media Interactions [Dataset]. http://doi.org/10.17632/wb2n39sgbp.1
    Explore at:
    Dataset updated
    Jun 28, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Jere, S (via Mendeley Data)
    Description

    The Cognitive Triad Dataset (CTD) is used to understand Beck's cognitive triad mechanism in an individual, which is crucial for early diagnosis and prognosis of depression. The Cognitive Triad Dataset (CTD) contains 5886 messages, including 4706 from the Tweeter, 600 from the Time-to-Change blog, and 580 from Beyond Blue personal stories. Six well-trained annotators manually labeled the data. This data includes six classes: self-negative, world-negative, future-negative, self-positive, world-positive, and future-positive. The CTD was evaluated on various sentiment classification algorithms. The dataset will assist in understanding Beck's Cognitive Triad Inventory (CTI) items in an individual's social media messages.

  9. Data from: Individual Responses to Affirmative Action Issues in Criminal...

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +2more
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Individual Responses to Affirmative Action Issues in Criminal Justice Agencies, 1981: [United States] [Dataset]. https://catalog.data.gov/dataset/individual-responses-to-affirmative-action-issues-in-criminal-justice-agencies-1981-united-2cfae
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justicehttp://nij.ojp.gov/
    Area covered
    United States
    Description

    These data, which are part of a larger study undertaken by the University of Wisconsin-Milwaukee, evaluate the responses of criminal justice employees to affirmative action within criminal justice agencies. Information is provided on employees' (1) general mood, (2) attitudes across various attributes, such as race, sex, rank, education and length of service, and (3) demographic characteristics including age, sex, race, educational level, parents' occupations, and living arrangements. The use of criminal justice employees as the units of analysis provides attitudinal and perceptual data in assessing affirmative action programs within each agency. Variables include reasons for becoming a criminal justice employee, attitudes toward affirmative action status in general, and attitudes about affirmative action in criminal justice settings.

  10. Z

    PSYCHE-D: predicting change in depression severity using person-generated...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariko Makhmutova (2024). PSYCHE-D: predicting change in depression severity using person-generated health data (DATASET) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5085145
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Marta Ferreira
    Jae Min
    Martin Jaggi
    Ieuan Clay
    Mariko Makhmutova
    Raghu Kainkaryam
    Description

    This dataset is made available under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See LICENSE.pdf for details.

    Dataset description

    Parquet file, with:

    35694 rows

    154 columns

    The file is indexed on [participant]_[month], such that 34_12 means month 12 from participant 34. All participant IDs have been replaced with randomly generated integers and the conversion table deleted.

    Column names and explanations are included as a separate tab-delimited file. Detailed descriptions of feature engineering are available from the linked publications.

    File contains aggregated, derived feature matrix describing person-generated health data (PGHD) captured as part of the DiSCover Project (https://clinicaltrials.gov/ct2/show/NCT03421223). This matrix focuses on individual changes in depression status over time, as measured by PHQ-9.

    The DiSCover Project is a 1-year long longitudinal study consisting of 10,036 individuals in the United States, who wore consumer-grade wearable devices throughout the study and completed monthly surveys about their mental health and/or lifestyle changes, between January 2018 and January 2020.

    The data subset used in this work comprises the following:

    Wearable PGHD: step and sleep data from the participants’ consumer-grade wearable devices (Fitbit) worn throughout the study

    Screener survey: prior to the study, participants self-reported socio-demographic information, as well as comorbidities

    Lifestyle and medication changes (LMC) survey: every month, participants were requested to complete a brief survey reporting changes in their lifestyle and medication over the past month

    Patient Health Questionnaire (PHQ-9) score: every 3 months, participants were requested to complete the PHQ-9, a 9-item questionnaire that has proven to be reliable and valid to measure depression severity

    From these input sources we define a range of input features, both static (defined once, remain constant for all samples from a given participant throughout the study, e.g. demographic features) and dynamic (varying with time for a given participant, e.g. behavioral features derived from consumer-grade wearables).

    The dataset contains a total of 35,694 rows for each month of data collection from the participants. We can generate 3-month long, non-overlapping, independent samples to capture changes in depression status over time with PGHD. We use the notation ‘SM0’ (sample month 0), ‘SM1’, ‘SM2’ and ‘SM3’ to refer to relative time points within each sample. Each 3-month sample consists of: PHQ-9 survey responses at SM0 and SM3, one set of screener survey responses, LMC survey responses at SM3 (as well as SM1, SM2, if available), and wearable PGHD for SM3 (and SM1, SM2, if available). The wearable PGHD includes data collected from 8 to 14 days prior to the PHQ-9 label generation date at SM3. Doing this generates a total of 10,866 samples from 4,036 unique participants.

  11. Family food datasets

    • gov.uk
    Updated Oct 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Environment, Food & Rural Affairs (2024). Family food datasets [Dataset]. https://www.gov.uk/government/statistical-data-sets/family-food-datasets
    Explore at:
    Dataset updated
    Oct 17, 2024
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Environment, Food & Rural Affairs
    Description

    These family food datasets contain more detailed information than the ‘Family Food’ report and mainly provide statistics from 2001 onwards. The UK household purchases and the UK household expenditure spreadsheets include statistics from 1974 onwards. These spreadsheets are updated annually when a new edition of the ‘Family Food’ report is published.

    The ‘purchases’ spreadsheets give the average quantity of food and drink purchased per person per week for each food and drink category. The ‘nutrient intake’ spreadsheets give the average nutrient intake (eg energy, carbohydrates, protein, fat, fibre, minerals and vitamins) from food and drink per person per day. The ‘expenditure’ spreadsheets give the average amount spent in pence per person per week on each type of food and drink. Several different breakdowns are provided in addition to the UK averages including figures by region, income, household composition and characteristics of the household reference person.

    UK (updated with new FYE 2023 data)

    countries and regions (CR) (updated with FYE 2022 data)

    equivalised income decile group (EID) (updated with FYE 2022 data)

  12. Individuals and Households Program - Valid Registrations

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FEMA/Response and Recovery/Recovery Directorate (2025). Individuals and Households Program - Valid Registrations [Dataset]. https://catalog.data.gov/dataset/individuals-and-households-program-valid-registrations-nemis
    Explore at:
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    Federal Emergency Management Agencyhttp://www.fema.gov/
    Description

    This dataset contains FEMA applicant-level data for the Individuals and Households Program (IHP). All PII information has been removed. The location is represented by county, city, and zip code. This dataset contains Individual Assistance (IA) applications from DR1439 (declared in 2002) to those declared over 30 days ago. The full data set is refreshed on an annual basis and refreshed weekly to update disasters declared in the last 18 months. This dataset includes all major disasters and includes only valid registrants (applied in a declared county, within the registration period, having damage due to the incident and damage within the incident period). Information about individual data elements and descriptions are listed in the metadata information within the dataset.rnValid registrants may be eligible for IA assistance, which is intended to meet basic needs and supplement disaster recovery efforts. IA assistance is not intended to return disaster-damaged property to its pre-disaster condition. Disaster damage to secondary or vacation homes does not qualify for IHP assistance.rnData comes from FEMA's National Emergency Management Information System (NEMIS) with raw, unedited, self-reported content and subject to a small percentage of human error.rnAny financial information is derived from NEMIS and not FEMA's official financial systems. Due to differences in reporting periods, status of obligations and application of business rules, this financial information may differ slightly from official publication on public websites such as usaspending.gov. This dataset is not intended to be used for any official federal reporting. rnCitation: The Agency’s preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnDue to the size of this file, tools other than a spreadsheet may be required to analyze, visualize, and manipulate the data. MS Excel will not be able to process files this large without data loss. It is recommended that a database (e.g., MS Access, MySQL, PostgreSQL, etc.) be used to store and manipulate data. Other programming tools such as R, Apache Spark, and Python can also be used to analyze and visualize data. Further, basic Linux/Unix tools can be used to manipulate, search, and modify large files.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.rnThis dataset is scheduled to be superceded by Valid Registrations Version 2 by early CY 2024.

  13. Study of prescription claims data and smoke exposure in children

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Study of prescription claims data and smoke exposure in children [Dataset]. https://catalog.data.gov/dataset/study-of-prescription-claims-data-and-smoke-exposure-in-children
    Explore at:
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data used in this study has PII and cannot be made public. Please contact the corresponding author of the manuscript. Dhingra R, Keeler C, Staley BS, Jardel HV, Ward-Caviness C, Rebuli ME, Xi Y, Rappazzo K, Hernandez M, Chelminski AN, Jaspers I, Rappold AG. Wildfire smoke exposure and early childhood respiratory health: a study of prescription claims data. Environ Health. 2023 Jun 27;22(1):48. doi: 10.1186/s12940-023-00998-5. PMID: 37370168; PMCID: PMC10294519. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Please reach out to the corresponding author. Format: Data includes HIPAA protected information. This dataset is associated with the following publication: Dhingra, R., C. Keeler, B. Staley, H. Jardel, C. Ward-Caviness, M. Rebuli, Y. Xi, K. Rappazzo, M. Hernandez, A. Chelminski, i. Jaspers, and A. Rappold. Wildfire smoke exposure and early childhood respiratory health: a study of prescription claims data. ENVIRONMENTAL HEALTH. BioMed Central Ltd, London, UK, 22(1): 48, (2023).

  14. d

    Open Data Dictionary Template Individual

    • opendata.dc.gov
    • catalog.data.gov
    • +2more
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Washington, DC (2023). Open Data Dictionary Template Individual [Dataset]. https://opendata.dc.gov/documents/cb6a686b1e344eeb8136d0103c942346
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    City of Washington, DC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

  15. EQA69 - Individuals who experienced discrimination in accessing/using health...

    • datasalsa.com
    • data.europa.eu
    csv, json-stat, px +1
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2025). EQA69 - Individuals who experienced discrimination in accessing/using health services [Dataset]. https://datasalsa.com/dataset/?catalogue=data.gov.ie&name=eqa69-individuals-who-experienced-discrimination-in-accessingusing-health-services
    Explore at:
    json-stat, xlsx, csv, pxAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Central Statistics Office Irelandhttps://www.cso.ie/en/
    Authors
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 14, 2025
    Description

    EQA69 - Individuals who experienced discrimination in accessing/using health services. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Individuals who experienced discrimination in accessing/using health services...

  16. Z

    Data from: YJMob100K: City-Scale and Longitudinal Dataset of Anonymized...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shimizu, Toru (2024). YJMob100K: City-Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8111992
    Explore at:
    Dataset updated
    Apr 21, 2024
    Dataset provided by
    Tsubouchi, Kota
    Shimizu, Toru
    Yabe, Takahiro
    Sekimoto, Yoshihide
    Pentland, Alex
    Sezaki, Kaoru
    Moro, Esteban
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The YJMob100K human mobility datasets (YJMob100K_dataset1.csv.gz and YJMob100K_dataset1.csv.gz) contain the movement of a total of 100,000 individuals across a 75 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of 80,000 individuals across a 75-day business-as-usual period, while the second dataset contains the movement of 20,000 individuals across a 75-day period (including the last 15 days during an emergency) with unusual behavior.

    While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (cell_POIcat.csv.gz). The list of 85 POI categories can be found in POI_datacategories.csv.

    For details of the dataset, see Data Descriptor:

    Yabe, T., Tsubouchi, K., Shimizu, T., Sekimoto, Y., Sezaki, K., Moro, E., & Pentland, A. (2024). YJMob100K: City-scale and longitudinal dataset of anonymized human mobility trajectories. Scientific Data, 11(1), 397. https://www.nature.com/articles/s41597-024-03237-9

    --- Details about the Human Mobility Prediction Challenge 2023 (ended November 13, 2023) ---

    The challenge takes place in a mid-sized and highly populated metropolitan area, somewhere in Japan. The area is divided into 500 meters x 500 meters grid cells, resulting in a 200 x 200 grid cell space.

    The human mobility datasets (task1_dataset.csv.gz and task2_dataset.csv.gz) contain the movement of a total of 100,000 individuals across a 90 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of a 75 day business-as-usual period, while the second dataset contains the movement of a 75 day period during an emergency with unusual behavior.

    There are 2 tasks in the Human Mobility Prediction Challenge.

    In task 1, participants are provided with the full time series data (75 days) for 80,000 individuals, and partial (only 60 days) time series movement data for the remaining 20,000 individuals (task1_dataset.csv.gz). Given the provided data, Task 1 of the challenge is to predict the movement patterns of the individuals in the 20,000 individuals during days 60-74. Task 2 is similar task but uses a smaller dataset of 25,000 individuals in total, 2,500 of which have the locations during days 60-74 masked and need to be predicted (task2_dataset.csv.gz).

    While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (which is optional for use in the challenge) (cell_POIcat.csv.gz).

    For more details, see https://connection.mit.edu/humob-challenge-2023

  17. d

    Data from: What We Eat In America (WWEIA) Database

    • catalog.data.gov
    • cloud.csiss.gmu.edu
    • +4more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). What We Eat In America (WWEIA) Database [Dataset]. https://catalog.data.gov/dataset/what-we-eat-in-america-wweia-database-f7f35
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Area covered
    United States
    Description

    What We Eat in America (WWEIA) is the dietary intake interview component of the National Health and Nutrition Examination Survey (NHANES). WWEIA is conducted as a partnership between the U.S. Department of Agriculture (USDA) and the U.S. Department of Health and Human Services (DHHS). Two days of 24-hour dietary recall data are collected through an initial in-person interview, and a second interview conducted over the telephone within three to 10 days. Participants are given three-dimensional models (measuring cups and spoons, a ruler, and two household spoons) and/or USDA's Food Model Booklet (containing drawings of various sizes of glasses, mugs, bowls, mounds, circles, and other measures) to estimate food amounts. WWEIA data are collected using USDA's dietary data collection instrument, the Automated Multiple-Pass Method (AMPM). The AMPM is a fully computerized method for collecting 24-hour dietary recalls either in-person or by telephone. For each 2-year data release cycle, the following dietary intake data files are available: Individual Foods File - Contains one record per food for each survey participant. Foods are identified by USDA food codes. Each record contains information about when and where the food was consumed, whether the food was eaten in combination with other foods, amount eaten, and amounts of nutrients provided by the food. Total Nutrient Intakes File - Contains one record per day for each survey participant. Each record contains daily totals of food energy and nutrient intakes, daily intake of water, intake day of week, total number foods reported, and whether intake was usual, much more than usual or much less than usual. The Day 1 file also includes salt use in cooking and at the table; whether on a diet to lose weight or for other health-related reason and type of diet; and frequency of fish and shellfish consumption (examinees one year or older, Day 1 file only). DHHS is responsible for the sample design and data collection, and USDA is responsible for the survey’s dietary data collection methodology, maintenance of the databases used to code and process the data, and data review and processing. USDA also funds the collection and processing of Day 2 dietary intake data, which are used to develop variance estimates and calculate usual nutrient intakes. Resources in this dataset:Resource Title: What We Eat In America (WWEIA) main web page. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/wweianhanes-overview/ Contains data tables, research articles, documentation data sets and more information about the WWEIA program. (Link updated 05/13/2020)

  18. d

    Bluetooth Travel Sensors - Individual Traffic Match Files (ITMF)

    • catalog.data.gov
    • datahub.austintexas.gov
    • +4more
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Bluetooth Travel Sensors - Individual Traffic Match Files (ITMF) [Dataset]. https://catalog.data.gov/dataset/bluetooth-travel-sensors-individual-traffic-match-files-itmf
    Explore at:
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    For information about the City of Austin's Bluetooth travel sensor data, visit our documentation page: https://github.com/cityofaustin/hack-the-traffic/tree/master/docs Each row in this dataset represents one Bluetooth enabled device that detected at two locations in the City of Austin's Bluetooth sensor network. Each record contains a detected device’s anonymized Media Access Control (MAC) address along with contain information about origin and destination points at which the device was detected, as well the time, date, and distance traveled. How does the City of Austin use the Bluetooth travel sensor data? The data enables transportation engineers to better understand short and long-term trends in Austin’s traffic patterns, supporting decisions about systems planning and traffic signal timing. What information does the data contain? The sensor data is available in three datasets: Individual Address Records ( https://data.austintexas.gov/dataset/Bluetooth-Travel-Sensors-Individual-Addresses/qnpj-zrb9/data ) Each row in this dataset represents a Bluetooth device that was detected by one of our sensors. Each record contains a detected device’s anonymized Media Access Control (MAC) address along with the time and location the device was detected. These records alone are not traffic data but can be post-processed to measure the movement of detected devices through the roadway network Individual Traffic Matches ( https://data.austintexas.gov/dataset/Bluetooth-Travel-Sensors-Individual-Traffic-Matche/x44q-icha/data ) Each row in this dataset represents one Bluetooth enabled device that detected at two locations in the roadway network. Each record contains a detected device’s anonymized Media Access Control (MAC) address along with contain information about origin and destination points at which the device was detected, as well the time, date, and distance traveled. Traffic Summary Records ( https://data.austintexas.gov/dataset/Bluetooth-Travel-Sensors-Match-Summary-Records/v7zg-5jg9 ) The traffic summary records contain aggregate travel time and speed summaries based on the individual traffic match records. Each row in the dataset summarizes average travel time and speed along a sensor-equipped roadway segment in 15 minute intervals. Does this data contain personally identifiable information? No. The Media Access Control (MAC) addresses in these datasets are randomly generated.

  19. N

    Person County, NC Population Breakdown by Gender and Age Dataset: Male and...

    • neilsberg.com
    csv, json
    Updated Feb 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Person County, NC Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/8e4472ea-c989-11ee-9145-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Carolina, Person County
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Person County by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Person County. The dataset can be utilized to understand the population distribution of Person County by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Person County. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Person County.

    Key observations

    Largest age group (population): Male # 60-64 years (1,638) | Female # 65-69 years (1,626). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the Person County population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Person County is shown in the following column.
    • Population (Female): The female population in the Person County is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Person County for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Person County Population by Gender. You can refer the same here

  20. Indian Personal Finance and Spending Habits

    • kaggle.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shriyash Jagtap (2024). Indian Personal Finance and Spending Habits [Dataset]. https://www.kaggle.com/datasets/shriyashjagtap/indian-personal-finance-and-spending-habits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    Kaggle
    Authors
    Shriyash Jagtap
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    dataset contains detailed financial and demographic data for 20,000 individuals, focusing on income, expenses, and potential savings across various categories. The data aims to provide insights into personal financial management and spending patterns.

    • Income & Demographics:
      • Income: Monthly income in currency units.
      • Age: Age of the individual.
      • Dependents: Number of dependents supported by the individual.
      • Occupation: Type of employment or job role.
      • City_Tier: A categorical variable representing the living area tier (e.g., Tier 1, Tier 2).
    • Monthly Expenses:
      • Categories like Rent, Loan_Repayment, Insurance, Groceries, Transport, Eating_Out, Entertainment, Utilities, Healthcare, Education, and Miscellaneous record various monthly expenses.
    • Financial Goals & Savings:
      • Desired_Savings_Percentage and Desired_Savings: Targets for monthly savings.
      • Disposable_Income: Income remaining after all expenses are accounted for.
    • Potential Savings:
      • Includes estimates of potential savings across different spending areas such as Groceries, Transport, Eating_Out, Entertainment, Utilities, Healthcare, Education, and Miscellaneous.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Prof Steve Tollman (2020). SAPRIN Individual Demographic Dataset 2018 - South Africa [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/study/zaf-saprin-sidd-2018-v1

SAPRIN Individual Demographic Dataset 2018 - South Africa

Explore at:
Dataset updated
Jul 9, 2020
Dataset provided by
Dr Kobus Herbst
Prof Mark Collinson
Prof Deenan Pillay
Prof Steve Tollman
Prof Marianne Alberts
Time period covered
1993 - 2017
Area covered
South Africa
Description

Abstract

The South African Population Research Infrastructure Network (SAPRIN) is a national research infrastructure funded through the Department of Science and Technology and hosted by the South African Medical Research Council. One of SAPRIN’s initial goals has been to harmonise the legacy longitudinal data from the three current Health and Demographic Surveillance System (HDSS) Nodes. These long-standing nodes are the MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, established in 1993, with a population of 116 000 people; the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, established in 1996, with a current population of 100 000; and the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, established in 2000, with a current population of 125 000.

SAPRIN data are processed for longitudinal analysis by organising the demographic data into residence episodes at a geographical location, and membership episodes within a household. Start events include enumeration, birth, in-migration and relocating into a household from within the study population; exit events include death (by cause), out-migration, and relocating to another location in the study population. Variables routinely updated at individual level include health care utilisation, marital status, labour status, education status, as well as recording household asset status. Anticipated outcomes of SAPRIN include: (i) regular releases of up-to-date, longitudinal data, representative of South Africa’s fast-changing poorer communities for research, interpretation and calibration of national datasets; (ii) national statistics triangulation, whereby longitudinal SAPRIN data are triangulated with National Census data for calibration of national statistics and studying the mechanisms driving the national statistics; (iii) An interdisciplinary research platform for conducting observational and interventional research at population level; (iv) policy engagement to provide evidence to underpin policy-making for cost evaluation and targeting intervention programmes, thereby improving the accuracy and efficiency of pro-poor, health and wellbeing interventions; (v) scientific education through training at related universities; and (vi) community engagement, whereby coordinated engagement with communities will enable two-way learning between researchers and community members, and enabling research site communities and service providers to have access to and make effective use of research results.

Geographic coverage

The Agincourt HDSS covers an area of approximately 420km2 and is located in Bushbuckridge District, Mpumalanga in the rural north-east of South Africa close to the Mozambique border. DIMAMO is located in the Capricorn district, Limpopo Province approximately 40 km from Polokwane, the capital city of Limpopo Province and 15-50 km from the University of Limpopo (Turfloop Campus). The site covers an area of approximately 200 km2. AHRI is situated in the south-east portion of the Umkhanyakude district of KwaZulu-Natal province near the town of Mtubatuba. It is bounded on the west by the Umfolozi-Hluhluwe nature reserve, on the south by the Umfolozi river, on the east by the N2 highway (except form portions where the KwaMsane township strandles the highway) and in the north by the Inyalazi river for portions of the boundary. The area is 438km2.

Analysis unit

Exposure episodes

Universe

Households resident in dwellings within the study area will be eligible for inclusion in the household component of SAPRIN. All individuals identified by the household proxy informant as a member of the household will be enumerated. A resident household member is an individual that intends to sleep the majority of time at the dwelling occupied by the household over a four-month period. Households will include resident and non-resident members. An individual is a non-resident member if they have close ties to the household, but do not physically reside with the household most of the time. They can also be called temporary migrants and they are enumerated within the household list. Because household membership is not tied to physical residency, an individual may be a member of more than one household.

Kind of data

Event/transaction data

Sampling procedure

This dataset is not based on a sample but contains information from the complete demographic surveillance areas.

Search
Clear search
Close search
Google apps
Main menu