Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A simple table time series for school probability and statistics. We have to learn how to investigate data: value via time. What we try to do: - mean: average is the sum of all values divided by the number of values. It is also sometimes referred to as mean. - median is the middle number, when in order. Mode is the most common number. Range is the largest number minus the smallest number. - standard deviation s a measure of how dispersed the data is in relation to the mean.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead ofurban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.
Facebook
TwitterThe previous review in this series introduced the notion of data description and outlined some of the more common summary measures used to describe a dataset. However, a dataset is typically only of interest for the information it provides regarding the population from which it was drawn. The present review focuses on estimation of population values from a sample.
Facebook
TwitterExamples of descriptive statistics that can be gleaned from Tracker data that could not be determined from standard beam cross data (Track CASK-β N = 30, Track Control N = 29).
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code Examples from the Book
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Overview:
This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.
The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.
File: Filename: heart_rate_data.csv File Format: CSV
- Features (Columns):
Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.
Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.
Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.
Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.
Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.
Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.
Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.
Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.
Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.
Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.
License: Choose an appropriate open license, such as:
CC BY 4.0 (Attribution 4.0 International).
Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?
Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
Facebook
Twitteranalyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purpose: The analysis of Ecological Momentary Assessment (EMA) data can be difficult to conceptualize due to the complexity of how the data are collected. The goal of this tutorial is to provide an overview of statistical considerations for analyzing observational data arising from EMA studies.Method: EMA data are collected in a variety of ways, complicating the statistical analysis. We focus on fundamental statistical characteristics of the data and general purpose statistical approaches to analyzing EMA data. We implement those statistical approaches using a recent study involving EMA.Results: The linear or generalized linear mixed-model statistical approach can adequately capture the challenges resulting from EMA collected data if properly set up. Additionally, while sample size depends on both the number of participants and the number of survey responses per participant, having more participants is more important than the number of responses per participant.Conclusion: Using modern statistical methods when analyzing EMA data and adequately considering all of the statistical assumptions being used can lead to interesting and important findings when using EMA.Supplemental Material S1. Power for given effect sizes, number of participants, and number of surveys per individual for a two independent groups comparison.Supplemental Material S2. Power for given effect sizes, number of participants, and number of surveys per individual for a paired groups comparison.Oleson, J. J., Jones, M. A., Jorgensen, E. J., & Wu, Y.-H. (2021). Statistical considerations for analyzing Ecological Momentary Assessment data. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2021_JSLHR-21-00081
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples of boilerplate text from PLOS ONE papers based on targeted n-gram searches (sentence level).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Αρχεία εργασίας για το βιβλίο μεθοδολογία έρευνάς
Facebook
Twitterhttps://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.15139/S3/JKLBZFhttps://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.15139/S3/JKLBZF
This Excel file contains example data as would be provided by an investigator to a collaborative statistician to analyze. Data are a permuted and edited version of real data provided to the authors during a statistical collaboration. The data are presented as commonly collected by investigators prior to working with a statistician, including several tabs of data in different domains (Set1, Set2, Demographics), colored cells, merged cells, cells with more than one data type, etc. as well as incomplete data and two systems of ID numbers. The file also includes a tab to link the different ID systems as well as tabs that have a "cleaned" version of the data (REVISEDSet1, REVISEDSet2) that would typically be provided after quality control identified some issues with the data that were then resolved by the investigator.
Facebook
TwitterThe 1998 Ghana Demographic and Health Survey (GDHS) is the latest in a series of national-level population and health surveys conducted in Ghana and it is part of the worldwide MEASURE DHS+ Project, designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 1998 GDHS is to provide current and reliable data on fertility and family planning behaviour, child mortality, children’s nutritional status, and the utilisation of maternal and child health services in Ghana. Additional data on knowledge of HIV/AIDS are also provided. This information is essential for informed policy decisions, planning and monitoring and evaluation of programmes at both the national and local government levels.
The long-term objectives of the survey include strengthening the technical capacity of the Ghana Statistical Service (GSS) to plan, conduct, process, and analyse the results of complex national sample surveys. Moreover, the 1998 GDHS provides comparable data for long-term trend analyses within Ghana, since it is the third in a series of demographic and health surveys implemented by the same organisation, using similar data collection procedures. The GDHS also contributes to the ever-growing international database on demographic and health-related variables.
National
Sample survey data
The major focus of the 1998 GDHS was to provide updated estimates of important population and health indicators including fertility and mortality rates for the country as a whole and for urban and rural areas separately. In addition, the sample was designed to provide estimates of key variables for the ten regions in the country.
The list of Enumeration Areas (EAs) with population and household information from the 1984 Population Census was used as the sampling frame for the survey. The 1998 GDHS is based on a two-stage stratified nationally representative sample of households. At the first stage of sampling, 400 EAs were selected using systematic sampling with probability proportional to size (PPS-Method). The selected EAs comprised 138 in the urban areas and 262 in the rural areas. A complete household listing operation was then carried out in all the selected EAs to provide a sampling frame for the second stage selection of households. At the second stage of sampling, a systematic sample of 15 households per EA was selected in all regions, except in the Northern, Upper West and Upper East Regions. In order to obtain adequate numbers of households to provide reliable estimates of key demographic and health variables in these three regions, the number of households in each selected EA in the Northern, Upper West and Upper East regions was increased to 20. The sample was weighted to adjust for over sampling in the three northern regions (Northern, Upper East and Upper West), in relation to the other regions. Sample weights were used to compensate for the unequal probability of selection between geographically defined strata.
The survey was designed to obtain completed interviews of 4,500 women age 15-49. In addition, all males age 15-59 in every third selected household were interviewed, to obtain a target of 1,500 men. In order to take cognisance of non-response, a total of 6,375 households nation-wide were selected.
Note: See detailed description of sample design in APPENDIX A of the survey report.
Face-to-face
Three types of questionnaires were used in the GDHS: the Household Questionnaire, the Women’s Questionnaire, and the Men’s Questionnaire. These questionnaires were based on model survey instruments developed for the international MEASURE DHS+ programme and were designed to provide information needed by health and family planning programme managers and policy makers. The questionnaires were adapted to the situation in Ghana and a number of questions pertaining to on-going health and family planning programmes were added. These questionnaires were developed in English and translated into five major local languages (Akan, Ga, Ewe, Hausa, and Dagbani).
The Household Questionnaire was used to enumerate all usual members and visitors in a selected household and to collect information on the socio-economic status of the household. The first part of the Household Questionnaire collected information on the relationship to the household head, residence, sex, age, marital status, and education of each usual resident or visitor. This information was used to identify women and men who were eligible for the individual interview. For this purpose, all women age 15-49, and all men age 15-59 in every third household, whether usual residents of a selected household or visitors who slept in a selected household the night before the interview, were deemed eligible and interviewed. The Household Questionnaire also provides basic demographic data for Ghanaian households. The second part of the Household Questionnaire contained questions on the dwelling unit, such as the number of rooms, the flooring material, the source of water and the type of toilet facilities, and on the ownership of a variety of consumer goods.
The Women’s Questionnaire was used to collect information on the following topics: respondent’s background characteristics, reproductive history, contraceptive knowledge and use, antenatal, delivery and postnatal care, infant feeding practices, child immunisation and health, marriage, fertility preferences and attitudes about family planning, husband’s background characteristics, women’s work, knowledge of HIV/AIDS and STDs, as well as anthropometric measurements of children and mothers.
The Men’s Questionnaire collected information on respondent’s background characteristics, reproduction, contraceptive knowledge and use, marriage, fertility preferences and attitudes about family planning, as well as knowledge of HIV/AIDS and STDs.
A total of 6,375 households were selected for the GDHS sample. Of these, 6,055 were occupied. Interviews were completed for 6,003 households, which represent 99 percent of the occupied households. A total of 4,970 eligible women from these households and 1,596 eligible men from every third household were identified for the individual interviews. Interviews were successfully completed for 4,843 women or 97 percent and 1,546 men or 97 percent. The principal reason for nonresponse among individual women and men was the failure of interviewers to find them at home despite repeated callbacks.
Note: See summarized response rates by place of residence in Table 1.1 of the survey report.
The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors, and (2) sampling errors. Nonsampling errors are the results of shortfalls made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 1998 GDHS to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 1998 GDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 1998 GDHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the 1998 GDHS is the ISSA Sampling Error Module. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.
Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Age distribution of eligible and interviewed men - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months
Note: See detailed tables in APPENDIX C of the survey report.
Facebook
TwitterDescriptive statistics of sample 1 and sample 2.
Facebook
TwitterDescriptive statistics of sample, split by counterbalance group.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Rahul Kate
Released under Apache 2.0
Facebook
TwitterThis data set includes summary statistics and trend analysis results of various analytes from groundwater samples in Idaho (1990 to 2016).
Facebook
TwitterAll population characteristics in the table were identical for the synthetic microdata and the American Community Survey data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A simple table time series for school probability and statistics. We have to learn how to investigate data: value via time. What we try to do: - mean: average is the sum of all values divided by the number of values. It is also sometimes referred to as mean. - median is the middle number, when in order. Mode is the most common number. Range is the largest number minus the smallest number. - standard deviation s a measure of how dispersed the data is in relation to the mean.