19 datasets found

After One to Many and Many to One Merge in Stata
kaggle.com
zip
Updated Feb 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iFinance Tutor (2023). After One to Many and Many to One Merge in Stata [Dataset]. https://www.kaggle.com/datasets/ifinancetutor/after-one-to-many-and-many-to-one-merge-in-stata
Explore at:
zip(2929 bytes)Available download formats
Dataset updated
Feb 1, 2023
Authors
iFinance Tutor
Description
Dataset

This dataset was created by iFinance Tutor

Contents
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
H
Survey of Consumer Finances (SCF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Survey of Consumer Finances (SCF) [Dataset]. http://doi.org/10.7910/DVN/FRMKMF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FRMKMF
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
g
Longitudinal Study of Generations, California, 1971, 1985, 1988, 1991, 1994,...
search.gesis.org
Updated Feb 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inter-University Consortium for Political and Social Research (2021). Longitudinal Study of Generations, California, 1971, 1985, 1988, 1991, 1994, 1997, 2000, 2005 - LSOG - Version 3 [Dataset]. http://doi.org/10.3886/ICPSR22100.v3
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR22100.v3
Dataset updated
Feb 26, 2021
Dataset provided by
Inter-University Consortium for Political and Social Research
GESIS search
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de459163https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de459163
Description
Abstract (en): The Longitudinal Study of Generations (LSOG), initiated in 1971, began as a survey of intergenerational relations among 300 three-generation California families with grandparents (then in their sixties), middle-aged parents (then in their early forties), and grandchildren (then aged 15 to 26). The study broadened in 1991 and now includes a fourth generation, the great-grandchildren of these same families. The LSOG, with a fully elaborated generation-sequential design, allows comparisons of sets of aging parents and children at the same stage of life but during different historical periods. These comparisons make possible the investigation of the effects of social change on inter-generational solidarity or conflict across 35 years and four generations, as well as the effects of social change on the ability of families to buffer stressful life transitions (e.g., aging, divorce and remarriage, higher female labor force participation, changes in work and the economy, and possible weakening of family norms of obligation), and the effects of social change on the transmission of values, resources, and behaviors across generations. The LSOG contains information on family structure, household composition, affectual solidarity and conflict, values, attitudes, behaviors, role importance, marital relationships, health and fitness, mental health and well-being, caregiving, leisure activities, and life events and concerns. Demographic variables include age, sex, income, employment status, marital status, socioeconomic history, education, religion, ethnicity, and military service. Presence of Common Scales: Affectual Solidarity Reliability, Consensual Solidarity (Socialization), Associational Solidarity, Functional Solidarity, Intergenerational Social Support, Normative Solidarity, Familism, Structural Solidarity, Intergenerational Feelings of Conflict, Management of Conflict Tactics, Rosenberg Self-Esteem, Depression (CES-D), Locus of Control, Bradburn Affect Balance, Eysenck Extraversion/Neuroticism, Anxiety (Hopkins Symptom Checklist), Activities of Daily Living (IADL/ADL), Religious Ideology, Political Conservatism, Gender Role Ideology, Individualism/Collectivism, Materialism/Humanism, Work Satisfaction, Gilford-Bengtson Marital Satisfaction Datasets:DS0: Study-Level FilesDS1: Waves 1-7DS2: Wave 8 Multi-generation families in California. Smallest Geographic Unit: None Families were drawn randomly from a subscriber list of 840,000 members of a California Health Maintenance Organization in Los Angeles. Families were recruited by enlisting a grandfather over the age of 60 who was part of a three-generation family that was willing to participate. 2019-08-21 The data were updated and resupplied by the data producer; ICPSR has updated the data and documentation to reflect these changes. Additionally, the data producer provided a Stata do file with syntax to merge the two datasets, which is available for download in the study zip folder. The study title was also updated.2016-07-06 Merril Silverstein was added to the collection as a P.I.2015-07-16 Wave 8 was added; including SPSS, SAS, and STATA datasets as well as an ICPSR Variable Description and Frequencies codebook. The codebook for part one was recompiled into a collection level codebook, including both parts one and two. A user guide for the collection has also been added.2009-05-12 Setup files have been updated. Funding institution(s): United States Department of Health and Human Services. National Institutes of Health. National Institute on Aging (2R01AG00799-21A2). computer-assisted self interview (CASI) face-to-face interview mail questionnaire self-enumerated questionnaire telephone interview
u
Code for Merging Waves of the Crime Survey of England and Wales and the...
datacatalogue.ukdataservice.ac.uk
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blom, N, University of Manchester (2025). Code for Merging Waves of the Crime Survey of England and Wales and the British Crime Survey, 1982-2024 [Dataset]. http://doi.org/10.5255/UKDA-SN-857928
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-857928
Dataset updated
Jul 7, 2025
Authors
Blom, N, University of Manchester
Time period covered
Jan 1, 1982 - Mar 31, 2024
Area covered
United Kingdom
Description
This code merges multiple years of Crime Survey of England and Wales (CSEW) and/or the British Crime Survey (BCS). The current version merges the BCS and CSEW up to the CSEW 2023/2024. The purpose of these code is to help researchers to quickly and easily combine multiple survey sweeps of the CSEW and BCS.

By combining multiple survey sweeps, people are able to look at, for instance, trends in violence. Furthermore, using such a combined file enables you to look at specific offences, population groups, or consequences, that do not have a high enough frequency if you would use only a single year.

This is a Stata do file, access to Stata is therefore required, as is access to all the BCS and CSEW that you want to merge. In specifying the code, you can decide which files you want to merge. Namely, which years of the Crime Surveys you want to merge and if you want the bolt-on datasets that provide uncapped codes, the adolescent and young adult panels, and/or if you want to use the ‘non-white’ panel. This code does not harmonize variables that are different between years.

All original data resources are available via Related Resources.
This code merges multiple years of Crime Survey of England and Wales (CSEW) and/or the British Crime Survey (BCS). The purpose of these code is to help researchers to quickly and easily combine multiple survey sweeps of the CSEW and BCS.

By combining multiple survey sweeps, people are able to look at, for instance, trends in violence. Furthermore, using such a combined file enables you to look at specific offences, population groups, or consequences, that do not have a high enough frequency if you would use only a single year.

This is a Stata do-file, access to Stata is therefore required, as is access to all the BCS and CSEW that you want to merge. In specifying the code, you can decide which files you want to merge. Namely, which years of the Crime Surveys you want to merge and if you want the bolt-on datasets that provide uncapped codes, the adolescent and young adult panels, and/or if you want to use the ‘non-white’ panel. This code does not harmonize variables that are different between years.
H
Replication Data for: Trajectories of mental health problems in childhood...
dataverse.harvard.edu
search.dataone.org
Updated Dec 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa-Christine Girard; Martin Okolikj (2022). Replication Data for: Trajectories of mental health problems in childhood and adult voting behaviour: Evidence from the 1970s British Cohort Study [Dataset]. http://doi.org/10.7910/DVN/S6UUBF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/S6UUBF
Dataset updated
Dec 12, 2022
Dataset provided by
Harvard Dataverse
Authors
Lisa-Christine Girard; Martin Okolikj
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This file describes the replication material for: Trajectories of mental health problems in childhood and adult voting behaviour: Evidence from the 1970s British Cohort Study. Authors: Lisa-Christine Girard & Martin Okolikj. Accepted in Political Behavior. This dataverse holds the following 4 replication files: 1. data_cleaning_traj.R - This file is designed to load, merge and clean the datasets for the estimation of trajectories along with the rescaling of the age 10 Rutter scale. This file was prepared using R-4.1.1 version. 2. traj_estimation.do - With the dataset merged from data_cleaning_traj.R, we run this file in STATA to create and estimate trajectories, to be included in the full dataset. This file was prepared using STATA 17.0 version. 3. data_cleaning.R - This is the file designed to load, merge and clean all datasets in one for preparation of the main analysis following the trajectory estimation. This file was prepared using R-4.1.1 version. 4. POBE Analysis.do - The analysis file is designed to generate the results from the tables in the published paper along with all supplementary materials. This file was prepared using STATA 17.0 version. The data can be accessed at the following address. It requires user registration under special licence conditions: http://discover.ukdataservice.ac.uk/series/?sn=200001. If you have any questions or spot any errors please contact g.lisachristine@gmail.com or martin.okolic@gmail.com.
2
UKHLS
datacatalogue.ukdataservice.ac.uk
Updated Oct 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Essex, Institute for Social and Economic Research (2025). UKHLS [Dataset]. http://doi.org/10.5255/UKDA-SN-9471-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-9471-1
Dataset updated
Oct 21, 2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
University of Essex, Institute for Social and Economic Research
Area covered
United Kingdom
Description
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.

The Understanding Society: Calendar Year Dataset, 2023, is designed for analysts to conduct cross-sectional analysis for the 2023 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.

The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can however, only involve variables that are collected in every wave (excluding rotating content which is only collected in some of the waves). Due to overlapping fieldwork the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the xxxx_main_survey_calendar_year_user_guide_2023.

These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: Safeguarded (End User Licence) version or Safeguarded/Special Licence version.

Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave, were also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+ year old household members are eligible for adult interviews, 10-15 year old household members are eligible for youth interviews, and some information is collected about 0-9 year olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022 a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, questionnaire content please see the study overview and user guide.

Co-funders

In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.

End User Licence and Special Licence versions:

There are two versions of the Calendar Year 2023 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see document '9471_eul_vs_sl_variable_differences' for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (Safeguarded (EUL)) and 6931 (Safeguarded/SL).

Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2023 dataset, subject to SL access conditions. See the User Guide for further details.

Suitable data analysis software

These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.
n
Multilevel modeling of time-series cross-sectional data reveals the dynamic...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Mar 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kodai Kusano (2020). Multilevel modeling of time-series cross-sectional data reveals the dynamic interaction between ecological threats and democratic development [Dataset]. http://doi.org/10.5061/dryad.547d7wm3x
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.547d7wm3x
Dataset updated
Mar 6, 2020
Dataset provided by
University of Nevada, Reno
Authors
Kodai Kusano
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
What is the relationship between environment and democracy? The framework of cultural evolution suggests that societal development is an adaptation to ecological threats. Pertinent theories assume that democracy emerges as societies adapt to ecological factors such as higher economic wealth, lower pathogen threats, less demanding climates, and fewer natural disasters. However, previous research confused within-country processes with between-country processes and erroneously interpreted between-country findings as if they generalize to within-country mechanisms. In this article, we analyze a time-series cross-sectional dataset to study the dynamic relationship between environment and democracy (1949-2016), accounting for previous misconceptions in levels of analysis. By separating within-country processes from between-country processes, we find that the relationship between environment and democracy not only differs by countries but also depends on the level of analysis. Economic wealth predicts increasing levels of democracy in between-country comparisons, but within-country comparisons show that democracy declines as countries become wealthier over time. This relationship is only prevalent among historically wealthy countries but not among historically poor countries, whose wealth also increased over time. By contrast, pathogen prevalence predicts lower levels of democracy in both between-country and within-country comparisons. Our longitudinal analyses identifying temporal precedence reveal that not only reductions in pathogen prevalence drive future democracy, but also democracy reduces future pathogen prevalence and increases future wealth. These nuanced results contrast with previous analyses using narrow, cross-sectional data. As a whole, our findings illuminate the dynamic process by which environment and democracy shape each other.

Methods Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).
2
UKHLS
datacatalogue.ukdataservice.ac.uk
Updated Oct 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Essex, Institute for Social and Economic Research (2025). UKHLS [Dataset]. http://doi.org/10.5255/UKDA-SN-9472-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-9472-1
Dataset updated
Oct 21, 2025
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
University of Essex, Institute for Social and Economic Research
Area covered
United Kingdom
Description
Understanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2023, is designed for analysts to conduct cross-sectional analysis for the 2023 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.

The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can however, only involve variables that are collected in every wave (excluding rotating content which is only collected in some of the waves). Due to overlapping fieldwork the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the document '9472_main_survey_calendar_year_user_guide_2023'.

These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: Safeguarded (End User Licence) version or Safeguarded/Special Licence version.

Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave, were also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+ year old household members are eligible for adult interviews, 10-15 year old household members are eligible for youth interviews, and some information is collected about 0-9 year olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022 a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, questionnaire content please see the study overview and user guide.

Co-funders

In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.

End User Licence and Special Licence versions:

There are two versions of the Calendar Year 2023 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see document '9472_eul_vs_sl_variable_differences' for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (Safeguarded (EUL)) and 6931 (Safeguarded/SL).

Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2023 dataset, subject to SL access conditions. See the User Guide for further details.

Suitable data analysis software

These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.
d
Replication Data for \"The Micro-Foundations of Party Competition and Issue...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neundorf, Anja; Adams, James (2023). Replication Data for \"The Micro-Foundations of Party Competition and Issue Ownership\" [Dataset]. http://doi.org/10.7910/DVN/OWF8RW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/OWF8RW
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Neundorf, Anja; Adams, James
Description
The article uses two datasets, which cannot be deposited online, but are freely available to registered users. The data of the German Socio-Economic Panel can be requested via http://www.diw.de/en/diw_02.c.222836.en/access.html. Here we are using version 24 (DOI: 10.5684/soep.v24). The data of the British Household Panel Study can be requested via http://discover.ukdataservice.ac.uk/series/?sn=20000. Here we provide two STATA do-files, one for each dataset, that will create the working file, which was used in the article. The do-files to merge and recode the original data.
Z
Night lights along the PL-DE border 1992-2012
data-staging.niaid.nih.gov
Updated Mar 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michal Myck; Ronny Freier; Mateusz Najsztub (2021). Night lights along the PL-DE border 1992-2012 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4600684
Explore at:
Dataset updated
Mar 13, 2021
Dataset provided by
Technical University of Applied Science at Wildau
Centre for Economic Analysis, CenEA
Authors
Michal Myck; Ronny Freier; Mateusz Najsztub
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets and software code (in the form of STATA dofiles) relate to the publication in Applied Economics entitled: "Lights along the frontier: convergence of economic activity in the proximity of the Polish-German border, 1992-2012".

The analysis dataset in STATA format is created by combining data coming from:

1) NOAA Version 4 DMSP-OLS Nighttime Lights Time Series (https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html);

2) Map data copyrighted OpenStreetMap (OSM) contributors and available from https://www.openstreetmap.org;

3) Administrative division of Poland, municipality level Shapefiles for 2018, PRG (http://www.gugik.gov.pl/pzgik/dane-bez-oplat/dane-z-panstwowego-rejestru-granic-i-powierzchni-jednostek-podzialow-terytorialnych-kraju-prg);

4) Map of the municipalities and districts of Germany as of 31.12.2013, VG250 and VG250-EW, © GeoBasis-DE / BKG 2013 (https://gdz.bkg.bund.de/);

Geographical data (nighttime lights, municipality borders for Poland and Germany and OpenStreetMap data) have been imported into PostgreSQL database using PostGIS plugin using batch processing in Python. Nighttime intensities for municipalities were created by intersecting vector municipality borders and raster lights data for each avaliable year and satelite. Light totals and averages were calculated using calibrated pixel values using 2nd deg. polynominal intercalibration parameters from Elvidge et al., National Trends in Satellite Observed Lighting: 1992-2009. Bridge crossings were identified using contemporary map data and OSM. OSM data were used to calculate road travel times and distances using pgRouting in PostgreSQL. Data were exported into CSV using Python and imported and merged in Stata, creating the initial dataset.
u
Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset...
datacatalogue.ukdataservice.ac.uk
Updated Jul 29, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Manchester, Cathie Marsh Centre for Census and Survey Research, ESDS Government (2011). Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset [Dataset]. http://doi.org/10.5255/UKDA-SN-6792-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-6792-1
Dataset updated
Jul 29, 2011
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
University of Manchester, Cathie Marsh Centre for Census and Survey Research, ESDS Government
Area covered
England
Description
The Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset was prepared as a resource for those interested in learning introductory small area estimation techniques. It was first presented as part of a workshop entitled 'Introducing small area estimation techniques and applying them to the Health Survey for England using Stata'. The data are accompanied by a guide that includes a practical case study enabling users to derive estimates of disability for districts in the absence of survey estimates. This is achieved using various models that combine information from ESDS government surveys with other aggregate data that are reliably available for sub-national areas. Analysis is undertaken using Stata statistical software; all relevant syntax is provided in the accompanying '.do' files.

The data files included in this teaching resource contain HSE variables and data from the Census and Mid-year population estimates and projections that were developed originally by the National Statistical agencies, as follows:
The main data file, 'hse_data.dta', is a reduced version of the HSE for 2000 and 2001. In order to combine data from two years of the HSE in a consistent way some changes have been made to the weights in each year. Additionally, some recoding of the limiting long term illness (LLTI), disability and the age variable has also been undertaken.
File 'practical_1_task_5_data.dta' contains population counts and model mobility disability rates (estimated during practical 1) distinguishing single year of age and sex for the six case study districts.
File 'practical_2_data.dta' contains the aggregate data required for Practical 2, including age- and sex-specific rates of LLTI (Census) for six UK case study districts, age- and sex-specific rates of mobility disability for England (HSE), and population counts for the six districts.
File 'pop_data_practical_3.dta' contains population counts for the six districts (by age, sex and LLTI status) required for practical 3
The original HSEs for 2000 and 2001 are held at the UK Data Archive under SNs 4628 and 4912 respectively. Full details of the recoding of HSE variables and how the aggregate data was produced can be found in the data documentation.

This unrestricted access data collection is freely available to download under an Open Government Licence from the UK Data Service. Note that the files should be unzipped/saved to the C: drive of the computer to be used; all syntax assumes files are saved at this location.
H
Area Resource File (ARF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Area Resource File (ARF) [Dataset]. http://doi.org/10.7910/DVN/8NMSFV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8NMSFV
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
2
Understanding Society, Waves 1-, 2008- : Safeguarded/Special Licence
datacatalogue.ukdataservice.ac.uk
Updated Jul 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Essex, Institute for Social and Economic Research (2022). Understanding Society, Waves 1-, 2008- : Safeguarded/Special Licence [Dataset]. http://doi.org/10.5255/UKDA-SN-8987-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-8987-1
Dataset updated
Jul 22, 2022
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
University of Essex, Institute for Social and Economic Research
Time period covered
Jan 1, 2020 - Dec 31, 2020
Area covered
United Kingdom
Description
Understanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.

The Understanding Society: Calendar Year Dataset, 2020, is designed to enable cross-sectional analysis of individuals and households relating specifically to their annual interviews conducted in the year 2020, and, therefore, combine data collected in three waves (Waves 10, 11 and 12). It has been produced from the same data collected in the main Understanding Society study and released in the longitudinal datasets SN 6614 (End User Licence) and SN 6931 (Special Licence). Such cross-sectional analysis can, however, only involve variables that are collected in every wave in order to have data for the full sample panel. The 2020 dataset is the first of a series of planned Calendar Year Datasets to facilitate cross-sectional analysis of specific years. Full details of the Calendar Year Dataset sample structure (including why some individual interviews from 2021 are included), data structure and additional supporting information can be found in the document '8987_calendar_year_dataset_2020_user_guide'.

As multi-topic studies, the purpose of Understanding Society is to understand short- and long-term effects of social and economic change in the UK at the household and individual levels. The study has a strong emphasis on domains of family and social ties, employment, education, financial resources, and health. Understanding Society is an annual survey of each adult member of a nationally representative sample. The same individuals are re-interviewed in each wave approximately 12 months apart. When individuals move they are followed within the UK and anyone joining their households are also interviewed as long as they are living with them. The fieldwork period for a single wave is 24 months. Data collection uses computer-assisted personal interviewing (CAPI) and web interviews (from wave 7), and includes a telephone mop up. From March 2020 (the end of wave 10 and 2nd year of wave 11), due to the coronavirus pandemic, face-to-face interviews were suspended and the survey has been conducted by web and telephone only, but otherwise has continued as before. One person completes the household questionnaire. Each person aged 16 or older participates in the individual adult interview and self-completed questionnaire. Youths aged 10 to 15 are asked to respond to a paper self-completion questionnaire. In 2020 an additional frequent web survey was separately issued to sample members to capture data on the rapid changes in people’s lives due to the COVID-19 pandemic (see SN 8644). The COVID-19 Survey data are not included in this dataset.

Further information may be found on the "https://www.understandingsociety.ac.uk/documentation/mainstage"> Understanding Society main stage webpage and links to publications based on the study can be found on the Understanding Society Latest Research webpage.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.

End User Licence and Special Licence versions:
There are two versions of the Calendar Year 2020 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see xxxx_eul_vs_sl_variable_differences for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (EUL) and 6931 (SL).

Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2020 dataset, subject to SL access conditions. See the User Guide for further details.

Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,900 variables.
g
Mexican Wealth Distribution 1810-1910
gimi9.com
researchdata.se
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Mexican Wealth Distribution 1810-1910 [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-57804-q8sr-qz06
Explore at:
Dataset updated
Dec 2, 2023
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The zip files contain several files with wills from Mexico between 1810 and 1910 collected in order to measure Mexican wealth distribution in its first century of independence. The main file is wills_clean.xlsx, which contains the full collection of wills; in that file, you will find variables for year, state, and wealth, not excluding debts, debts and wealth (net wealth). You can combine this file with the do file cleaningroutine_for_social_tables to produce the detailed social tables. The rest of the files consist of data files with the social tables (for comparison) and xlsx files with the wills from the main file divided by decade to facilitate calculations using the do file inequality_analysis_ routine_clean.do from which you will be able to reproduce the rest of the analysis (unbalanced sample and generalized beta, lognormal, etc.) Note: The calculation programs are .do files; thus, they require stata to be executed. Some of the detailed social tables are dta files, and thus also stata files. You can open them in R and work with them or convert them to any other data format. The wills come from 5 different Mexican archives: Archivo Histórico de Notarias de la Ciudad de México, Archivo General del Estado de Yucatán, Archivo Municipal de Saltillo, Archivo Histórico de la Ciudad de Morelia and, Testamentos del Colegio de Sonora.
Z
Datasets used in the study "Trends in medication use after the onset of the...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mattsson, M; Hong, JA; Frazer, JS; Frazer, RG; Moriarty, F (2023). Datasets used in the study "Trends in medication use after the onset of the COVID-19 pandemic in the Republic of Ireland: an interrupted time series study" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7999791
Explore at:
Dataset updated
Jun 3, 2023
Dataset provided by
University of Oxford
RCSI University of Medicine and Health Sciences
Queen's University Belfast
Authors
Mattsson, M; Hong, JA; Frazer, JS; Frazer, RG; Moriarty, F
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ireland
Description
This record contains datasets analysed as part of the study "Trends in medication use after the onset of the COVID-19 pandemic in the Republic of Ireland: an interrupted time series study".

Two datasets were used, one relating to therapeutic subgroups defined by ATC codes (atc_wide_freq_avg.csv) and one relating to individual medications (drugs_wide_freq_avg.csv). Datasets were collated by combining monthly data reported by HSE Primary Care Reimbursement Services in Ireland relating to dispensing on the General Medical Services scheme at https://www.sspcrs.ie/portal/annual-reporting/

Code used to collate datasets and for data management is included in Stata format (compile_data_export_for_analysis_final.do).

The study protocol is available at https://doi.org/10.17605/OSF.IO/B4RTM
DISCERN: Duke Innovation & SCientific Enterprises Research Network
zenodo.org
pdf, zip
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arora Ashish; Belenzon Sharon; Sheer Lia; Arora Ashish; Belenzon Sharon; Sheer Lia (2024). DISCERN: Duke Innovation & SCientific Enterprises Research Network [Dataset]. http://doi.org/10.5281/zenodo.3594743
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3594743
Dataset updated
Aug 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Arora Ashish; Belenzon Sharon; Sheer Lia; Arora Ashish; Belenzon Sharon; Sheer Lia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database links patent data to Compustat firms. When using the data, please cite "WHY DO FIRMS INVEST IN RESEARCH?" (Arora, Belenzon and Sheer), NBER WP 23187.

Please follow the Stata DO files to merge the data into Compustat (using the field "gvkey"). The program “main_do_file.do” is the main do file. It runs all the other do files. See the Readme file for more detail.

This project introduces major data extension and improvement to the historical NBER patent dataset, which should be valuable for all researchers working with patent data linked to firms. In updating the data to match between Compustat and patents to 2015, we address two major challenges: name changes and ownership changes. These challenges are central to how patents are assigned to firms over time. To be consistent over the sample period, we reconstruct the complete historical data covered in the NBER data files.

About 30% of the Compustat firms in our sample change their name at least once. Accounting for name changes improves the accuracy and scope of matches to patents (and other assets), ownership structure, and dynamic reassignments of GVKEY codes to companies. Dynamic reassignment means that, for instance, if a sample firm merges with another firm, the patents of the merged firm are included in the stock of patents linked to the Compustat record from that point onward, but not before.

For ownership and subsidiary data we rely on a wide range of M&A data, including SDC, historical snapshots of ORBIS files for 2002-2015, 10-K SEC filings, and NBER2006 as well as perform extensive manual checks that help us uncover firms’ structure and ownership changes before proceeding to the patent match. Thus, we have extended and improved the NBER patent data. In the enclosed "Data Appendix", we document our data construction work, present several examples (“case studies”), and outline the improvements we made to existing NBER historical patent data.
i
Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102...
catalog.ihsn.org
datacatalog.ihsn.org
+1more
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Statistical Office (NSO) (2023). Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs) - Malawi [Dataset]. http://catalog.ihsn.org/catalog/8702
Explore at:
Dataset updated
Jul 19, 2023
Dataset authored and provided by
National Statistical Office (NSO)
Time period covered
2010 - 2019
Area covered
Malawi
Description
Abstract

The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.

Geographic coverage

National coverage

Analysis unit

Households

Individuals

Children under 5 years

Consumption expenditure commodities/items

Communities

Agricultural household/ Holder/ Crop

Universe

The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.

Kind of data

Sample survey data [ssd]

Sampling procedure

A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.

Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.

Mode of data collection

Computer Assisted Personal Interview [capi]

Cleaning operations

Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.

Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.

The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.

Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.

The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the
Descriptive statistics on government regulation (means and standard...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tuem Gebre Abraha; Haftom Teshale Gebre (2025). Descriptive statistics on government regulation (means and standard deviations). [Dataset]. http://doi.org/10.1371/journal.pone.0320681.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320681.t004
Dataset updated
Apr 8, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Tuem Gebre Abraha; Haftom Teshale Gebre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics on government regulation (means and standard deviations).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

iFinance Tutor (2023). After One to Many and Many to One Merge in Stata [Dataset]. https://www.kaggle.com/datasets/ifinancetutor/after-one-to-many-and-many-to-one-merge-in-stata

After One to Many and Many to One Merge in Stata

Explore at:

zip(2929 bytes)Available download formats

Dataset updated

Feb 1, 2023

Authors

iFinance Tutor

Description

Dataset

This dataset was created by iFinance Tutor

Clear search

Close search

Google apps

Main menu

After One to Many and Many to One Merge in Stata

Dataset

Contents

Current Population Survey (CPS)

Survey of Consumer Finances (SCF)

Longitudinal Study of Generations, California, 1971, 1985, 1988, 1991, 1994,...

Code for Merging Waves of the Crime Survey of England and Wales and the...

Replication Data for: Trajectories of mental health problems in childhood...

UKHLS

Multilevel modeling of time-series cross-sectional data reveals the dynamic...

UKHLS

Replication Data for \"The Micro-Foundations of Party Competition and Issue...

Night lights along the PL-DE border 1992-2012

Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset...

Area Resource File (ARF)

Understanding Society, Waves 1-, 2008- : Safeguarded/Special Licence

Mexican Wealth Distribution 1810-1910

Datasets used in the study "Trends in medication use after the onset of the...

DISCERN: Duke Innovation & SCientific Enterprises Research Network

Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Descriptive statistics on government regulation (means and standard...

After One to Many and Many to One Merge in Stata

Dataset

Contents