100+ datasets found

f
Individual difference measures: Descriptive statistics and variable...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deist, Melanie; Fourie, Melike M. (2023). Individual difference measures: Descriptive statistics and variable intercorrelations. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001091396
Explore at:
Dataset updated
Apr 6, 2023
Authors
Deist, Melanie; Fourie, Melike M.
Description
Individual difference measures: Descriptive statistics and variable intercorrelations.
Adult income is over $50,000 a year.
kaggle.com
zip
Updated Oct 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Atif Latif (2024). Adult income is over $50,000 a year. [Dataset]. https://www.kaggle.com/datasets/matiflatif/adult-income-is-over-50000-a-year
Explore at:
zip(724624 bytes)Available download formats
Dataset updated
Oct 16, 2024
Authors
M Atif Latif
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context

This dataset contains information about individuals' demographic and employment attributes to predict whether their income exceeds $50,000 per year. It originates from the 1994 U.S. Census database and has been widely used in classification problems, making it an excellent resource for machine learning, data analysis, and statistical modeling.

Content

The dataset includes various features related to personal and work-related attributes. The target variable is whether an individual's income exceeds $50,000 annually.

Key features include:

Age: Age of the individual.

Workclass: Employment type (e.g., private, government, self-employed).

Education: Highest level of education achieved.

Education-Num: Number corresponding to the level of education.

Marital Status: Marital status of the individual.

Occupation: Profession or job role.

Relationship: Family role (e.g., husband, wife, not in family).

Race: Race of the individual.

Sex: Gender of the individual.

Capital Gain: Income from investment sources other than salary.

Capital Loss: Losses from investment sources.

Hours Per Week: Average number of hours worked per week.

Native Country: Country of origin of the individual

Variables

Age: Continuous variable representing the age of the individual.

Workclass: Categorical variable indicating the type of employment (e.g., Private, Self-Employed, Government).

Education: Categorical variable showing the highest level of education achieved (e.g., Bachelors, Masters).

Education-Num: Numerical representation of the education level.

Marital Status: Categorical variable representing marital status (e.g., Married, Never-Married).

Occupation: Categorical variable indicating the job role or occupation

Relationship: Categorical variable describing the family relationship (e.g., Husband, Wife).

Race: Categorical variable showing the race of the individual.

Sex: Categorical variable indicating the gender of the individual.

Capital Gain: Continuous variable representing income from capital gains.

Capital Loss: Continuous variable representing losses from investments.

Hours Per Week: Continuous variable showing the average working hours per week.

Native Country: Categorical variable indicating the country of origin.

Income: Target variable (binary), indicating whether the individual earns more than $50,000 (>50K) or not (<=50K).

Acknowledgements

This dataset was derived from the 1994 U.S. Census database and has been made publicly available for research and educational purposes. It is not affiliated with any specific organization. Users are encouraged to comply with ethical data usage guidelines while working with this dataset.
Variables for Alzheimer's analysis (without PII data)
catalog.data.gov
datasets.ai
+1more
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Variables for Alzheimer's analysis (without PII data) [Dataset]. https://catalog.data.gov/dataset/variables-for-alzheimers-analysis-without-pii-data
Explore at:
Dataset updated
Dec 13, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Organized by zipcode: Rates of Alzheimer's disease Percent of landcover types Modelled PM2.5 Socioeconomic variables. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Lucas Neas (CPHEA/PHESD/EB) is the owner of the copy of this dataset that was used. Format: Medicare database. This dataset is associated with the following publication: Wu, J., and L. Jackson. Greenspace inversely associated with the risk of Alzheimer’s disease in the mid-Atlantic United States. Earth. MDPI AG, Basel, SWITZERLAND, 2(1): 140-150, (2021).
i
Household Health Survey 2012-2013, Economic Research Forum (ERF)...
catalog.ihsn.org
datacatalog.ihsn.org
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Economic Research Forum
Central Statistical Organization (CSO)
Kurdistan Regional Statistics Office (KRSO)
Time period covered
2012 - 2013
Area covered
Iraq
Description
Abstract

The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

The survey has six main objectives. These objectives are:

Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.

Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.

Provide data that meet the needs and requirements of national accounts.

Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.

Provide detailed indicators on the sources of households and individuals income.

Provide data necessary for formulation of a new consumer price index number.

The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

Geographic coverage

National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

Kind of data

Sample survey data [ssd]

Sampling procedure

----> Design:

Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

----> Sample frame:

Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

----> Sampling Stages:

In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

Mode of data collection

Face-to-face [f2f]

Research instrument

----> Preparation:

The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

----> Questionnaire Parts:

The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

Cleaning operations

----> Raw Data:

Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

----> Harmonized Data:

The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.

The harmonization process starts with raw data files received from the Statistical Office.

A program is generated for each dataset to create harmonized variables.

Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Survivor Statistics Dataset
kaggle.com
zip
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lance Tobey (2025). Survivor Statistics Dataset [Dataset]. https://www.kaggle.com/datasets/lancetobey/survivor-statistics-dataset/data
Explore at:
zip(109357 bytes)Available download formats
Dataset updated
Jun 11, 2025
Authors
Lance Tobey
Description
This is a dataset I have been working on to document the voting stats of every Survivor Player so far.

Currently encompasses Survivor: Borneo to Survivor 48

Current plan is to update with any noticed errors, add data currently not on the sheet that is suggested, and wait for 48.

If you see an error, please feel free to DM me and I will make sure to fix it. For any more information, including on the variables and my choices for certain variable values, please reference the README
f
Overview of the data sets used and the operationalization of the variables...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Sep 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lohse, Stefan; Rissland, Jürgen; Holleczek, Bernd; Firk, Christiane; Stoffel, Harry; Berkó-Göttel, Barbara; Lehr, Thorsten; Giesen, Martin; Brandt, Florian; Schöpe, Jakob; Müller, Hanna; Schanzenbach, Alexandra; Werthner, Quirin; Hauptmann, Gunter; Wagenpfeil, Stefan; Weber, Gero; Hohmann, Heike; Smola, Sigrun; Sternjakob-Marthaler, Anna; Taurian, Emeline; Lamberty, Thomas; Selzer, Dominik (2022). Overview of the data sets used and the operationalization of the variables for the individual research questions. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000392238
Explore at:
Dataset updated
Sep 2, 2022
Authors
Lohse, Stefan; Rissland, Jürgen; Holleczek, Bernd; Firk, Christiane; Stoffel, Harry; Berkó-Göttel, Barbara; Lehr, Thorsten; Giesen, Martin; Brandt, Florian; Schöpe, Jakob; Müller, Hanna; Schanzenbach, Alexandra; Werthner, Quirin; Hauptmann, Gunter; Wagenpfeil, Stefan; Weber, Gero; Hohmann, Heike; Smola, Sigrun; Sternjakob-Marthaler, Anna; Taurian, Emeline; Lamberty, Thomas; Selzer, Dominik
Description
Overview of the data sets used and the operationalization of the variables for the individual research questions.
LIFE EXPECTANCY
kaggle.com
zip
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DavidGatt (2024). LIFE EXPECTANCY [Dataset]. https://www.kaggle.com/datasets/davidgatt222/life-expectansy-dataset
Explore at:
zip(1089650 bytes)Available download formats
Dataset updated
Oct 21, 2024
Authors
DavidGatt
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Overview This project analyzes life expectancy across countries, utilizing data from 2000 to 2015. The study examines how key socioeconomic and health factors influence life expectancy. Factors such as GDP, adult mortality, schooling, HIV/AIDS prevalence, and BMI are included in the analysis, which uses multiple linear regression and mixed-effects modeling to determine which variables significantly affect life expectancy.

Data Description The dataset includes life expectancy information and its influencing factors from various countries over a 15-year period (2000-2015). The data was sourced from the WHO Life Expectancy Dataset available on Kaggle. It comprises both continuous and categorical variables, including: • Life Expectancy (Dependent Variable): Average number of years an individual is expected to live. Continuous Variables: o GDP per capita o Adult Mortality (per 1000 individuals aged 15-65) o Schooling (mean years of education) o Alcohol consumption per capita Categorical Variables: o HIV/AIDS prevalence o Country status (Developed vs. Developing) o BMI category (Underweight, Normal, Overweight, Obese)

Problem Statement Life expectancy is a crucial metric for assessing the overall health and well-being of populations. It varies significantly between countries due to economic, social, and health factors. This project seeks to identify the most important variables that predict life expectancy, offering insights for policymakers on improving public health and longevity in their populations. Hypotheses 1. Higher GDP leads to higher life expectancy. 2. Higher adult mortality results in lower life expectancy. 3. More years of schooling increase life expectancy. 4. Higher HIV/AIDS prevalence reduces life expectancy. 5. Living in a developed country increases life expectancy. 6. Higher BMI (underweight or obese) correlates with reduced life expectancy. 7. Higher alcohol consumption reduces life expectancy.

Methodology • Data Preprocessing: Missing values were handled by imputation, and skewed variables (like GDP) were log-transformed to improve model performance. • Exploratory Data Analysis: Visualizations (histograms, scatterplots, and box plots) were used to understand the relationships between independent variables and life expectancy. Modeling: o Multiple Linear Regression was used to examine how each continuous and categorical variable impacts life expectancy. o Mixed-effects modeling was applied to account for country-specific effects, capturing variability across different nations.

Key Results 1. GDP: Log-transformed GDP had a significant positive effect on life expectancy, with an adjusted R² of 0.29. Higher income is positively correlated with longer life expectancy. 2. Adult Mortality: Increased adult mortality significantly reduced life expectancy. For every unit increase in adult mortality, life expectancy decreased by 0.042 years. 3. Schooling: More years of schooling was strongly correlated with longer life expectancy, reflecting the importance of education in enhancing health outcomes. 4. HIV/AIDS: Countries with higher HIV/AIDS prevalence had lower life expectancy, with significant negative coefficients for all levels of prevalence. 5. Country Status: Developed countries had significantly higher life expectancy than developing countries, with an average difference of about 1.52 years. 6. BMI: While underweight and obese categories were significant predictors, the relationship between BMI and life expectancy was complex, suggesting that high-income countries might offset health risks through medical care. 7. Alcohol Consumption: Contrary to initial expectations, alcohol consumption did not have a statistically significant effect on life expectancy in this model.
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
A Comparison of Aggregate P-Value Methods and Multivariate Statistics for...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew W. Mitchell (2023). A Comparison of Aggregate P-Value Methods and Multivariate Statistics for Self-Contained Tests of Metabolic Pathway Analysis [Dataset]. http://doi.org/10.1371/journal.pone.0125081
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125081
Dataset updated
Jun 11, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Matthew W. Mitchell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For pathway analysis of genomic data, the most common methods involve combining p-values from individual statistical tests. However, there are several multivariate statistical methods that can be used to test whether a pathway has changed. Because of the large number of variables and pathway sizes in genomics data, some of these statistics cannot be computed. However, in metabolomics data, the number of variables and pathway sizes are typically much smaller, making such computations feasible. Of particular interest is being able to detect changes in pathways that may not be detected for the individual variables. We compare the performance of both the p-value methods and multivariate statistics for self-contained tests with an extensive simulation study and a human metabolomics study. Permutation tests, rather than asymptotic results are used to assess the statistical significance of the pathways. Furthermore, both one and two-sided alternatives hypotheses are examined. From the human metabolomic study, many pathways were statistically significant, although the majority of the individual variables in the pathway were not. Overall, the p-value methods perform at least as well as the multivariate statistics for these scenarios.
Divergent trends in life expectancy across the rural-urban gradient and...
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Divergent trends in life expectancy across the rural-urban gradient and association with specific racial proportions in the contiguous United States 2000-2005 [Dataset]. https://catalog.data.gov/dataset/divergent-trends-in-life-expectancy-across-the-rural-urban-gradient-and-association-w-2000
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Contiguous United States, United States
Description
We used individual-level death data to estimate county-level life expectancy at 25 (e25) for Whites, Black, AIAN and Asian in the contiguous US for 2000-2005. Race-sex-stratified models were used to examine the associations among e25, rurality and specific race proportion, adjusted for socioeconomic variables. Individual death data from the National Center for Health Statistics were aggregated as death counts into five-year age groups by county and race-sex groups for the contiguous US for years 2000-2005 (National Center for Health Statistics 2000-2005). We used bridged-race population estimates to calculate five-year mortality rates. The bridged population data mapped 31 race categories, as specified in the 1997 Office of Management and Budget standards for the collection of data on race and ethnicity, to the four race categories specified under the 1977 standards (the same as race categories in mortality registration) (Ingram et al. 2003). The urban-rural gradient was represented by the 2003 Rural Urban Continuum Codes (RUCC), which distinguished metropolitan counties by population size, and nonmetropolitan counties by degree of urbanization and adjacency to a metro area (United States Department of Agriculture 2016). We obtained county-level sociodemographic data for 2000-2005 from the US Census Bureau. These included median household income, percent of population attaining greater than high school education (high school%), and percent of county occupied rental units (rent%). We obtained county violent crime from Uniform Crime Reports and used it to calculate mean number of violent crimes per capita (Federal Bureau of Investigation 2010). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Request to author. Format: Data are stored as csv files. This dataset is associated with the following publication: Jian, Y., L. Neas, L. Messer, C. Gray, J. Jagai, K. Rappazzo, and D. Lobdell. Divergent trends in life expectancy across the rural-urban gradient among races in the contiguous United States. International Journal of Public Health. Springer Basel AG, Basel, SWITZERLAND, 64(9): 1367-1374, (2019).
2018 Census individual part 2 total NZ by statistical area 1 (2018 Census...
datafinder.stats.govt.nz
csv, dbf (dbase iii) +4
Updated Apr 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats NZ (2020). 2018 Census individual part 2 total NZ by statistical area 1 (2018 Census only) look up table [Dataset]. https://datafinder.stats.govt.nz/table/104570-2018-census-individual-part-2-total-nz-by-statistical-area-1-2018-census-only-look-up-table/attachments/22592/
Explore at:
geodatabase, csv, mapinfo mif, dbf (dbase iii), mapinfo tab, geopackage / sqliteAvailable download formats
Dataset updated
Apr 14, 2020
Dataset provided by
Statistics New Zealandhttp://www.stats.govt.nz/
Authors
Stats NZ
License
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Area covered
New Zealand
Description
This lookup table relates to the web service 2018 Census individual part 2 by SA1. The web service contains data from the 2018 Census only, no data from previous censuses has been included.

The individual (part 2) dataset is displayed by statistical area 1 geography and contains information on: • Religious affiliation (total responses) • Cigarette smoking behaviour • Difficulty seeing even if wearing glasses • Difficulty hearing even if using a hearing aid • Difficulty walking or climbing steps • Difficulty remembering or concentrating • Difficulty washing all over or dressing • Difficulty communicating using your usual language for example being understood by others • Legally registered relationship status • Partnership status in current relationship • Individual home ownership • Number of children born • Highest qualification • Study participation • Total personal income (grouped), including median total personal income • Sources of personal income (total responses) • Main means of travel to education, by usual residence address (2018 only) • Main means of travel to education, by educational institution address (2018 only)

The data uses fixed random rounding to protect confidentiality. Some counts of less than 6 are suppressed according to 2018 confidentiality rules. Values of ‘-999’ indicate suppressed data, and values of ‘Null’ indicate data not collected.

For further information on this dataset please refer to the Statistical area 1 dataset for 2018 Census webpage - footnotes for individual part 2, Excel workbooks, and CSV files are available to download. Data quality ratings for 2018 Census variables, summarising the quality rating and priority levels for 2018 Census variables, are available.

For information on the statistical area 1 geography please refer to the Statistical standard for geographic areas 2018.
i
Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...
catalog.ihsn.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7662
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
The Hashemite Kingdom of Jordan Department of Statistics (DOS)
Time period covered
2010 - 2011
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Households

Individuals

Kind of data

Sample survey data [ssd]

Sampling procedure

The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

Mode of data collection

Face-to-face [f2f]

Research instrument

General form

Expenditure on food commodities form

Expenditure on non-food commodities form

Cleaning operations

Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.
2018 Census Individual (part 3a) total New Zealand by Statistical Area 1
datafinder.stats.govt.nz
csv, dwg, geodatabase +6
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats NZ (2020). 2018 Census Individual (part 3a) total New Zealand by Statistical Area 1 [Dataset]. https://datafinder.stats.govt.nz/layer/104621-2018-census-individual-part-3a-total-new-zealand-by-statistical-area-1/
Explore at:
mapinfo tab, pdf, mapinfo mif, geodatabase, kml, shapefile, csv, geopackage / sqlite, dwgAvailable download formats
Dataset updated
May 18, 2020
Dataset provided by
Statistics New Zealandhttp://www.stats.govt.nz/
Authors
Stats NZ
License
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Area covered
New Zealand,
Description
This individual (part 3a) dataset is displayed by statistical area 1 geography and contains information on:

• Work and labour force status

• Status in employment

• Occupation – major group, by usual residence address

• Occupation – major group, by workplace address*

• Industry (division), by usual residence address

• Industry (division), by workplace address*

* Workplace address is coded from information supplied by respondents about their workplaces. Where respondents do not supply sufficient information, their responses are coded to ‘not further defined’. The statistical area 1 dataset for 2018 Census excludes these ‘not further defined’ areas.

This dataset contains counts at statistical area 1 for selected variables from the 2018, 2013, and 2006 censuses. The geography corresponds to 2018 boundaries.

The data uses fixed random rounding to protect confidentiality. Some counts of less than 6 are suppressed according to 2018 confidentiality rules. Values of ‘-999’ indicate suppressed data.

For further information on this dataset please refer to the Statistical area 1 dataset for 2018 Census webpage - footnotes for individual part 3a, Excel workbooks, and CSV files are available to download. Data quality ratings for 2018 Census variables, summarising the quality rating and priority levels for 2018 Census variables, are available.

For information on the statistical area 1 geography please refer to the Statistical standard for geographic areas 2018.
R code dataset derivation centralized.
plos.figshare.com
txt
Updated Nov 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau (2024). R code dataset derivation centralized. [Dataset]. http://doi.org/10.1371/journal.pone.0312697.s011
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312697.s011
Dataset updated
Nov 14, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.
d
Tabular statistical summay of data analysis - Calawah River Riverscape Study...
catalog.data.gov
s.cnmilf.com
+1more
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). Tabular statistical summay of data analysis - Calawah River Riverscape Study [Dataset]. https://catalog.data.gov/dataset/tabular-statistical-summay-of-data-analysis-calawah-river-riverscape-study3
Explore at:
Dataset updated
May 24, 2025
Dataset provided by
(Point of Contact, Custodian)
Area covered
Calawah River
Description
The objective of this study was to identify the patterns of juvenile salmonid distribution and relative abundance in relation to habitat correlates. It is the first dataset of its kind because the entire river was snorkeled by one person in multiple years. During two consecutive summers, we completed a census of juvenile salmonids and stream habitat across a stream network. We used the data to test the ability of habitat models to explain the distribution of juvenile coho salmon (Oncorhynchus kisutch), young-of-the-year (age 0) steelhead (Oncorhynchus mykiss), and steelhead parr (= age 1) for a network consisting of several different sized streams. Our network-scale models, which included five stream habitat variables, explained 27%, 11%, and 19% of the variation in the density of juvenile coho salmon, age 0 steelhead, and steelhead parr, respectively. We found weak to strong levels of spatial auto-correlation in the model residuals (Moran's I values ranging from 0.25 - 0.71). Explanatory power of base habitat models increased substantially and the level of spatial auto-correlation decreased with sequential inclusion of variables accounting for stream size, year, stream, and reach location. The models for specific streams underscored the variability that was implied in the network-scale models. Associations between juvenile salmonids and individual habitat variables were rarely linear and ranged from negative to positive, and the variable accounting for location of the habitat within a stream was often more important than any individual habitat variable. The limited success in predicting the summer distribution and density of juvenile coho salmon and steelhead with our network-scale models was apparently related to variation in the strength and shape of fish-habitat associations across and within streams and years. Summary of statistical analysis of the Calawah Riverscape data. NOAA was not involved and did not pay for the collection of this data. This data represents the statistical analysis carried out by Martin Liermann as a NOAA employee.
Historic US Census - 1940
redivis.com
application/jsonl +7
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1940 [Dataset]. http://doi.org/10.57761/660g-eq95
Explore at:
avro, arrow, sas, application/jsonl, spss, parquet, stata, csvAvailable download formats
Unique identifier
https://doi.org/10.57761/660g-eq95
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1940 - Dec 31, 1940
Area covered
United States
Description
Abstract

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The IPUMS microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

phsdatacore@stanford.edu for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Documentation

Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The historic US 1940 census data was collected in April 1940. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Notes

We provide IPUMS household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.

Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT40, reconstructed using the variable SERIAL40, and the original count is found in the variable NUMPREC40.

Some variables are missing from this data set for specific enumeration districts. The enumeration districts with missing data can be identified using the variable EDMISS. These variables will be added in a future release.

Coded variables derived from string variables are still in progress. These variables include: occupation, industry and migration status.

Missing observations have been allocated and some inconsistencies have been edited for the following variables: Missing observations have been allocated and some inconsistencies have been edited for the following variables: SURSIM, SEX, SCHOOL, RELATE, RACE, OCC1950, MTONGUE, MBPL, FBPL, BPL, MARST, EMPSTAT, CITIZEN, OWNERSHP. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.

Most inconsistent information was not edited for this release, thus there are observations outside of the universe for many variables. In particular, the variables GQ, and GQTYPE have known inconsistencies and will be improved with the next r
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
nada-demo.ihsn.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
w
COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS...
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Oct 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Bureau of Statistics (NBS) (2021). COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS Harmonized Dataset - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/3856
Explore at:
Dataset updated
Oct 25, 2021
Dataset authored and provided by
National Bureau of Statistics (NBS)
Time period covered
2018 - 2021
Area covered
Nigeria
Description
Abstract

To facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.

The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.

Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.

Geographic coverage

National coverage

Analysis unit

Households

Individuals

Universe

The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

Kind of data

Sample survey data [ssd]

Sampling procedure

See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.

Mode of data collection

Computer Assisted Personal Interview [capi]

Cleaning operations

Nigeria General Household Survey, Panel (GHS-Panel) 2018-2019 and Nigeria COVID-19 National Longitudinal Phone Survey (COVID-19 NLPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).

The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.

Response rate

See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.
Pre and Post-Exercise Heart Rate Analysis
kaggle.com
zip
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah M Almutairi (2024). Pre and Post-Exercise Heart Rate Analysis [Dataset]. https://www.kaggle.com/datasets/abdullahmalmutairi/pre-and-post-exercise-heart-rate-analysis
Explore at:
zip(3857 bytes)Available download formats
Dataset updated
Sep 29, 2024
Authors
Abdullah M Almutairi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview:

This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.

The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.

File: Filename: heart_rate_data.csv File Format: CSV

- Features (Columns):

Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.

Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.

Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.

Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.

Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.

Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.

Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.

Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.

Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.

Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.

License: Choose an appropriate open license, such as:

CC BY 4.0 (Attribution 4.0 International).

Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?

Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
2023 Census totals by topic for individuals by statistical area 1 – part 1
datafinder.stats.govt.nz
csv, dwg, geodatabase +6
Updated Nov 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats NZ (2024). 2023 Census totals by topic for individuals by statistical area 1 – part 1 [Dataset]. https://datafinder.stats.govt.nz/layer/120766-2023-census-totals-by-topic-for-individuals-by-statistical-area-1-part-1/
Explore at:
geodatabase, dwg, mapinfo mif, shapefile, csv, kml, geopackage / sqlite, pdf, mapinfo tabAvailable download formats
Dataset updated
Nov 14, 2024
Dataset provided by
Statistics New Zealandhttp://www.stats.govt.nz/
Authors
Stats NZ
License
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Area covered
Description
Dataset contains counts and measures for individuals from the 2013, 2018, and 2023 Censuses. Data is available by statistical area 1.

The variables included in this dataset are for the census usually resident population count (unless otherwise stated). All data is for level 1 of the classification (unless otherwise stated).

The variables for part 1 of the dataset are:

Census usually resident population count

Census night population count

Age (5-year groups)

Age (life cycle groups)

Median age

Birthplace (NZ born/overseas born)

Birthplace (broad geographic areas)

Ethnicity (total responses) for level 1 and ‘Other Ethnicity’ grouped by ‘New Zealander’ and ‘Other Ethnicity nec’

Māori descent indicator

Languages spoken (total responses)

Official language indicator

Gender

Sex at birth

Rainbow/LGBTIQ+ indicator for the census usually resident population count aged 15 years and over

Sexual identity for the census usually resident population count aged 15 years and over

Legally registered relationship status for the census usually resident population count aged 15 years and over

Partnership status in current relationship for the census usually resident population count aged 15 years and over

Number of children born for the sex at birth female census usually resident population count aged 15 years and over

Average number of children born for the sex at birth female census usually resident population count aged 15 years and over

Religious affiliation (total responses)

Cigarette smoking behaviour for the census usually resident population count aged 15 years and over

Disability indicator for the census usually resident population count aged 5 years and over

Difficulty communicating for the census usually resident population count aged 5 years and over

Difficulty hearing for the census usually resident population count aged 5 years and over

Difficulty remembering or concentrating for the census usually resident population count aged 5 years and over

Difficulty seeing for the census usually resident population count aged 5 years and over

Difficulty walking for the census usually resident population count aged 5 years and over

Difficulty washing for the census usually resident population count aged 5 years and over.

Download lookup file for part 1 from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.

Footnotes

Te Whata

Under the Mana Ōrite Relationship Agreement, Te Kāhui Raraunga (TKR) will be publishing Māori descent and iwi affiliation data from the 2023 Census in partnership with Stats NZ. This will be available on Te Whata, a TKR platform.

Geographical boundaries

Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

Subnational census usually resident population

The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.

Population counts

Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.

Caution using time series

Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).

Study participation time series

In the 2013 Census study participation was only collected for the census usually resident population count aged 15 years and over.

About the 2023 Census dataset

For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

Data quality

The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

Concept descriptions and quality ratings

Data quality ratings for 2023 Census variables has additional details about variables found within totals by topic, for example, definitions and data quality.

Disability indicator

This data should not be used as an official measure of disability prevalence. Disability prevalence estimates are only available from the 2023 Household Disability Survey. Household Disability Survey 2023: Final content has more information about the survey.

Activity limitations are measured using the Washington Group Short Set (WGSS). The WGSS asks about six basic activities that a person might have difficulty with: seeing, hearing, walking or climbing stairs, remembering or concentrating, washing all over or dressing, and communicating. A person was classified as disabled in the 2023 Census if there was at least one of these activities that they had a lot of difficulty with or could not do at all.

Using data for good

Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

Confidentiality

The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

Measures

Measures like averages, medians, and other quantiles are calculated from unrounded counts, with input noise added to or subtracted from each contributing value during measures calculations. Averages and medians based on less than six units (e.g. individuals, dwellings, households, families, or extended families) are suppressed. This suppression threshold changes for other quantiles. Where the cells have been suppressed, a placeholder value has been used.

Percentages

To calculate percentages, divide the figure for the category of interest by the figure for 'Total stated' where this applies.

Symbol

-997 Not available

-999 Confidential

Inconsistencies in definitions

Please note that there may be differences in definitions between census classifications and those used for other data collections.

Facebook

Twitter

Click to copy link

Link copied

Cite

Deist, Melanie; Fourie, Melike M. (2023). Individual difference measures: Descriptive statistics and variable intercorrelations. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001091396

Individual difference measures: Descriptive statistics and variable intercorrelations.

Explore at:

Dataset updated

Apr 6, 2023

Authors

Deist, Melanie; Fourie, Melike M.

Description

Individual difference measures: Descriptive statistics and variable intercorrelations.

Clear search

Close search

Google apps

Main menu

Individual difference measures: Descriptive statistics and variable...

Adult income is over $50,000 a year.

Context

Content

Variables

Acknowledgements

Variables for Alzheimer's analysis (without PII data)

Household Health Survey 2012-2013, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Survivor Statistics Dataset

Overview of the data sets used and the operationalization of the variables...

LIFE EXPECTANCY

Simulation Data Set

A Comparison of Aggregate P-Value Methods and Multivariate Statistics for...

Divergent trends in life expectancy across the rural-urban gradient and...

2018 Census individual part 2 total NZ by statistical area 1 (2018 Census...

Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

2018 Census Individual (part 3a) total New Zealand by Statistical Area 1

R code dataset derivation centralized.

Tabular statistical summay of data analysis - Calawah River Riverscape Study...

Historic US Census - 1940

Abstract

Before Manuscript Submission

Documentation

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

COVID-19 National Longitudinal Phone Survey 2020 – World Bank LSMS...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Response rate

Pre and Post-Exercise Heart Rate Analysis

2023 Census totals by topic for individuals by statistical area 1 – part 1

Individual difference measures: Descriptive statistics and variable intercorrelations.See More Versions

Individual difference measures: Descriptive statistics and variable intercorrelations.