9 datasets found
  1. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  2. o

    Data and Code for Democracy and Aid Donorship

    • openicpsr.org
    delimited, stata, zip
    Updated Oct 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelika J. Budjan; Andreas Fuchs (2021). Data and Code for Democracy and Aid Donorship [Dataset]. https://www.openicpsr.org/openicpsr/project/120068/version/V2/view?path=/openicpsr/120068/fcr:versions/V2/Analyse-data.do&type=file
    Explore at:
    delimited, stata, zipAvailable download formats
    Dataset updated
    Oct 25, 2021
    Dataset provided by
    American Economic Association
    Authors
    Angelika J. Budjan; Andreas Fuchs
    Time period covered
    1950 - 2015
    Area covered
    global
    Description
    README TO Democracy and Aid Donorship, Budjan, Angelika J., and Andreas Fuchs, American Economic Journal: Economic Policy.

    AEA Data and Code Repository project ID: 120068

    The replication material consists of four Stata do files, 20 raw input data files, five analysis datasets, and two shapefiles contained in the “outputdata” folder. Analyses have been performed with Stata version 14.0. Running the master do file (“Democracy and Aid Donorship replication file MAIN.do”) will call the configuration do file (“config.do”), the data cleaning do file (“Prepare data.do”), and the data analysis do file (“Analyse data.do”). The configuration do file creates five new folders: the “ado” folder where necessary ado files are stored; the “outputdata” folder where the generated analysis datasets are stored; the “tables” folder where results tables are stored; the “figures” folder where generated figures are stored and the “tempdata” folder where temporary datasets are stored and which are automatically deleted by the end of the script.

    In order to run the master do file (“Democracy and Aid Donorship replication file MAIN.do”), insert the correct folder path in line 19.

    The data analyses do file (“Analyse data.do”) generates four regression datasets in the “outputdata” folder. We had to omit some raw databases from the “input” data folder due to copyright reasons (Marshall et al. 2016; Banks and Wilson 2016; FreedomHouse 2016; Bormann et al. 2017; Correlates of War Project 2017). Since several “input” datasets are omitted from the download package, the do file will neither run without error nor produce the complete datasets required for the analysis – which we however provide in their entirety in the “outputdata” folder. The four regression datasets are the following: ·
    • “new_donors_MAIN.dta” is needed to create Tables 1-3, Figures 2-4, and most tables and figures of the Online Appendix ·
    • “new_donors_limited.dta” and “new_donors_3yaverages.dta” are needed to create the robustness test of Table B3 in the Online Appendix ·
    • “new_donors_sample_firstaid.dta” is needed to create robustness tests of Table C2 in the Online Appendix.
    Figure 1 and Appendix Figure C1 were not produced with STATA. Data from our New Aid Donors Database was merged with country boundaries and saved in shapefile format in the output folder using R. This step can be replicated with the file “Prepare_figure1_figureC1.R.” To run the code, insert the correct folder path in line 9. To create the maps, open the resulting files in QGIS and format the layer “donoryear” as in the manuscript.

    Lines 510-544 of “Prepare_data.do” produce our main variable of interest “democracy” as a temporary datafile (“tempdata\acemoglu_democ.dta”), using the inputs Polity IV Project version 4 (Marshall et al. 2016), Bjørnskov-Rode regime data (Bjørnskov and Rode 2020), and Freedom in the World Country and Territory Ratings and Statuses (Freedom House 2016). This file is then merged to the final analysis datasets. Since our analysis was performed prior to the publication of Acemoglu et al. (2019) and since we require a longer time period for our analysis, the employed data is our own replication and extension of Acemoglu et al.’s democracy variable. To allow users to generate Figure A3 without having executed “Prepare_data.do” before, we also included “acemoglu_democ.dta” in the outdata folder.


  3. f

    FGDs patients’ characteristics Stata format dataset and its do file.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter M. Karoli; Grace A. Shayo; Elizabeth H. Shayo; Christine V. Wood; Theresia A. Ottaru; Claudia A. Hawkins; Erasto V. Mbugi; Sokoine L. Kivuyo; Sayoki G. Mfinanga; Sylvia F. Kaaya; Eric J. Mgina; Lisa R. Hirschhorn (2023). FGDs patients’ characteristics Stata format dataset and its do file. [Dataset]. http://doi.org/10.1371/journal.pgph.0001024.s004
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS Global Public Health
    Authors
    Peter M. Karoli; Grace A. Shayo; Elizabeth H. Shayo; Christine V. Wood; Theresia A. Ottaru; Claudia A. Hawkins; Erasto V. Mbugi; Sokoine L. Kivuyo; Sayoki G. Mfinanga; Sylvia F. Kaaya; Eric J. Mgina; Lisa R. Hirschhorn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We imported the excel sheet FGD patients’ characteristics into the Stata software for conducting simple descriptive analysis. Therefore, a saved dataset and its do file has been shared with editors and reviewers for their reference. (ZIP)

  4. H

    National Health and Nutrition Examination Survey (NHANES)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). National Health and Nutrition Examination Survey (NHANES) [Dataset]. http://doi.org/10.7910/DVN/IMWQPJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the national health and nutrition examination survey (nhanes) with r nhanes is this fascinating survey where doctors and dentists accompany survey interviewers in a little mobile medical center that drives around the country. while the survey folks are interviewing people, the medical professionals administer laboratory tests and conduct a real doctor's examination. the b lood work and medical exam allow researchers like you and me to answer tough questions like, "how many people have diabetes but don't know they have diabetes?" conducting the lab tests and the physical isn't cheap, so a new nhanes data set becomes available once every two years and only includes about twelve thousand respondents. since the number of respondents is so small, analysts often pool multiple years of data together. the replication scripts below give a few different examples of how multiple years of data can be pooled with r. the survey gets conducted by the centers for disease control and prevention (cdc), and generalizes to the united states non-institutional, non-active duty military population. most of the data tables produced by the cdc include only a small number of variables, so importation with the foreign package's read.xport function is pretty straightforward. but that makes merging the appropriate data sets trickier, since it might not be clear what to pull for which variables. for every analysis, start with the table with 'demo' in the name -- this file includes basic demographics, weighting, and complex sample survey design variables. since it's quick to download the files directly from the cdc's ftp site, there's no massive ftp download automation script. this new github repository co ntains five scripts: 2009-2010 interview only - download and analyze.R download, import, save the demographics and health insurance files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the interview weights run a series of pretty generic analyses on the health insurance ques tions 2009-2010 interview plus laboratory - download and analyze.R download, import, save the demographics and cholesterol files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the mobile examination component (mec) weights perform a direct-method age-adjustment and matc h figure 1 of this cdc cholesterol brief replicate 2005-2008 pooled cdc oral examination figure.R download, import, save, pool, recode, create a survey object, run some basic analyses replicate figure 3 from this cdc oral health databrief - the whole barplot replicate cdc publications.R download, import, save, pool, merge, and recode the demographics file plus cholesterol laboratory, blood pressure questionnaire, and blood pressure laboratory files match the cdc's example sas and sudaan syntax file's output for descriptive means match the cdc's example sas and sudaan synta x file's output for descriptive proportions match the cdc's example sas and sudaan syntax file's output for descriptive percentiles replicate human exposure to chemicals report.R (user-contributed) download, import, save, pool, merge, and recode the demographics file plus urinary bisphenol a (bpa) laboratory files log-transform some of the columns to calculate the geometric means and quantiles match the 2007-2008 statistics shown on pdf page 21 of the cdc's fourth edition of the report click here to view these five scripts for more detail about the national health and nutrition examination survey (nhanes), visit: the cdc's nhanes homepage the national cancer institute's page of nhanes web tutorials notes: nhanes includes interview-only weights and interview + mobile examination component (mec) weights. if you o nly use questions from the basic interview in your analysis, use the interview-only weights (the sample size is a bit larger). i haven't really figured out a use for the interview-only weights -- nhanes draws most of its power from the combination of the interview and the mobile examination component variables. if you're only using variables from the interview, see if you can use a data set with a larger sample size like the current population (cps), national health interview survey (nhis), or medical expenditure panel survey (meps) instead. confidential to sas, spss, stata, sudaan users: why are you still riding around on a donkey after we've invented the internal combustion engine? time to transition to r. :D

  5. d

    Replication Data for: A High Court Plays the Accordion: Validating Ex Ante...

    • search.dataone.org
    • dataverse.no
    • +1more
    Updated Jan 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bentsen, Henrik L.; Grendstad, Gunnar; Shaffer, William R.; Waltenburg, Eric N. (2024). Replication Data for: A High Court Plays the Accordion: Validating Ex Ante Case Complexity on Oral Arguments [Dataset]. http://doi.org/10.18710/DWIX6Y
    Explore at:
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    DataverseNO
    Authors
    Bentsen, Henrik L.; Grendstad, Gunnar; Shaffer, William R.; Waltenburg, Eric N.
    Description

    The data set (saved in Stata *.dta and .txt) contains all observations (Norwegian supreme court cases 2008-2018 decided in five-justice panels) and variables (independent variables measuring complexity of cases and the dependent variable measuring time in hours scheduled for oral arguments) relevant for a complete replication of the the study. ABSTRACT OF STUDY: While high courts with fixed time for oral arguments deprive researchers of the opportunity to extract temporal variance, courts that apply the “accordion model” institutional design and adjust the time for oral arguments according to the perceived complexity of a case are a boon for research that seeks to validate case complexity well ahead of the courts’ opinion writing. We analyse an original data set of all 1,402 merits decisions of the Norwegian Supreme Court from 2008 to 2018 where the justices set time for oral arguments to accommodate the anticipated difficulty of the case. Our validation model empirically tests whether and how attributes of a case associated with ex ante complexity are linked with time allocated for oral arguments. Cases that deal with international law and civil law, have several legal players, are cross-appeals from lower courts are indicative of greater case complexity. We argue that these results speak powerfully to the use of case attributes and/or the time reserved for oral arguments as ex ante measures of case complexity. To enhance the external validity of our findings, future studies should examine whether these results are confirmed in high courts with similar institutional design for oral arguments. Subsequent analyses should also test the degree to which complex cases and/or time for oral arguments have predictive validity on more divergent opinions among the justices and on the time courts and justices need to render a final opinion.

  6. f

    Data from the 'Parenting with Anxiety' trial (2022)

    • sussex.figshare.com
    bin
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samantha Cartwright-Hatton; Abigail Dunn; James Alvarez; Amy Arbon; Stephen Bremner; Chloe Elsby-Pearson; Richard Emsley; Christopher Iain Jones; Peter J. Lawrence; Kathryn J Lester; Mirjana Majdandžić; Natalie Morson; Nicky Perry; J. Simner; Abigail Thomson (2025). Data from the 'Parenting with Anxiety' trial (2022) [Dataset]. http://doi.org/10.25377/sussex.25428244.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 18, 2025
    Dataset provided by
    University of Sussex
    Authors
    Samantha Cartwright-Hatton; Abigail Dunn; James Alvarez; Amy Arbon; Stephen Bremner; Chloe Elsby-Pearson; Richard Emsley; Christopher Iain Jones; Peter J. Lawrence; Kathryn J Lester; Mirjana Majdandžić; Natalie Morson; Nicky Perry; J. Simner; Abigail Thomson
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Parenting with Anxiety was a randomised controlled trial of a web-based intervention for parents with anxiety difficulties, aimed at preventing anxiety in offspring. Two datasets have been prepared for sharing: pwa_parents_share.dta contains the data recorded from parents who took part in the study. pwa_cores_share.dta contains data provided by an additional adult, as nominated by the index parent. A single dataset has not been prepared as the datasets come from different database exports. Full exports were provided for the final SWAT analyses and final full analyses but the SWAT analyses occurred before data collection was complete for the parents. Hence the co-respondent dataset was derived from a database export on 11th May 2023 and the parent dataset was derived from the final database export on 8th June 2023. The sharable datasets represent the datasets that were used for each of the respective analyses. Datasets were saved in Stata format (.dta). This format was chosen as it was the format used for analyses, retains metadata (e.g. variable and category labels), and can be opened directly in SPSS, SAS or R (using the haven package). More information about preparation of the datasets is found in pwa_dataset_preparation.pdf

    Background: Anxiety is the most common childhood mental health condition and is associated with impaired child outcomes, including increased risk of mental health difficulties in adulthood. Anxiety runs in families: when a parent has anxiety, their child has a 50% higher chance of developing it themselves. Environmental factors are predominant in the intergenerational transmission of anxiety and, of these, parenting processes play a major role. Interventions that target parents to support them to limit the impact of any anxiogenic parenting behaviors are associated with reduced anxiety in their children. A brief UK-based group intervention delivered to parents within the UK National Health Service led to a 16% reduction in children meeting the criteria for an anxiety disorder. However, this intervention is not widely accessible. To widen access, a 9-module web-based version of this intervention has been developed. This course comprises psychoeducation and home practice delivered through text, video, animations, and practice tasks.

    Objective: This study seeks to evaluate the feasibility of delivering this web-based intervention and assess its effectiveness in reducing child anxiety symptoms.

    Methods: This is the protocol for a randomized controlled trial (RCT) of a community sample of 1754 parents with self-identified high levels of anxiety with a child aged 2-11 years. Parents in the intervention arm will receive access to the web-based course, which they undertake at a self-determined rate. The control arm receives no intervention. Follow-up data collection is at months 6 and months 9-21. Intention-to-treat analysis will be conducted on outcomes including child anxiety, child mental health symptoms, and well-being; parental anxiety and well-being; and parenting behaviors.

    Results: Funding was received in April 2020, and recruitment started in February 2021, ending in October 2022. A total of 1350 participants were recruited as of May 2022. Trial outcomes are pending publication in late 2024.

    Conclusions: The results of this RCT will provide evidence on the utility of a web-based course in preventing intergenerational transmission of anxiety and increase the understanding of familial anxiety.

    Trial Registration: ClinicalTrials.gov NCT04755933; https://clinicaltrials.gov/ct2/show/NCT04755933

    International Registered Report Identifier (IRRID): DERR1-10.2196/40707

    JMIR Res Protoc 2022;11(11):e40707

  7. H

    Area Resource File (ARF)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Area Resource File (ARF) [Dataset]. http://doi.org/10.7910/DVN/8NMSFV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D

  8. Monitoring COVID-19 Impact on Refugees in Ethiopia: High-Frequency Phone...

    • microdata.unhcr.org
    • datacatalog.ihsn.org
    • +2more
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank-UNHCR Joint Data Center on Forced Displacement (JDC) (2022). Monitoring COVID-19 Impact on Refugees in Ethiopia: High-Frequency Phone Survey of Refugees 2020 - Ethiopia [Dataset]. https://microdata.unhcr.org/index.php/catalog/704
    Explore at:
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    United Nations High Commissioner for Refugeeshttp://www.unhcr.org/
    Authors
    World Bank-UNHCR Joint Data Center on Forced Displacement (JDC)
    Time period covered
    2020
    Area covered
    Ethiopia
    Description

    Abstract

    The high-frequency phone survey of refugees monitors the economic and social impact of and responses to the COVID-19 pandemic on refugees and nationals, by calling a sample of households every four weeks. The main objective is to inform timely and adequate policy and program responses. Since the outbreak of the COVID-19 pandemic in Ethiopia, two rounds of data collection of refugees were completed between September and November 2020. The first round of the joint national and refugee HFPS was implemented between the 24 September and 17 October 2020 and the second round between 20 October and 20 November 2020.

    Analysis unit

    Household

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample was drawn using a simple random sample without replacement. Expecting a high non-response rate based on experience from the HFPS-HH, we drew a stratified sample of 3,300 refugee households for the first round. More details on sampling methodology are provided in the Survey Methodology Document available for download as Related Materials.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The Ethiopia COVID-19 High Frequency Phone Survey of Refugee questionnaire consists of the following sections:

    • Interview Information
    • Household Roster
    • Camp Information
    • Knowledge Regarding the Spread of COVID-19
    • Behaviour and Social Distancing - Access to Basic Services
    • Employment
    • Income Loss
    • Coping/Shocks
    • Social Relations
    • Food Security
    • Aid and Support/ Social Safety Nets.

    A more detailed description of the questionnaire is provided in Table 1 of the Survey Methodology Document that is provided as Related Materials. Round 1 and 2 questionnaires available for download.

    Cleaning operations

    DATA CLEANING At the end of data collection, the raw dataset was cleaned by the Research team. This included formatting, and correcting results based on monitoring issues, enumerator feedback and survey changes. Data cleaning carried out is detailed below.

    Variable naming and labeling: • Variable names were changed to reflect the lowercase question name in the paper survey copy, and a word or two related to the question. • Variables were labeled with longer descriptions of their contents and the full question text was stored in Notes for each variable. • “Other, specify” variables were named similarly to their related question, with “_other” appended to the name. • Value labels were assigned where relevant, with options shown in English for all variables, unless preloaded from the roster in Amharic.

    Variable formatting: • Variables were formatted as their object type (string, integer, decimal, time, date, or datetime). • Multi-select variables were saved both in space-separated single-variables and as multiple binary variables showing the yes/no value of each possible response. • Time and date variables were stored as POSIX timestamp values and formatted to show Gregorian dates. • Location information was left in separate ID and Name variables, following the format of the incoming roster. IDs were formatted to include only the variable level digits, and not the higher-level prefixes (2-3 digits only.)
    • Only consented surveys were kept in the dataset, and all personal information and internal survey variables were dropped from the clean dataset. • Roster data is separated from the main data set and kept in long-form but can be merged on the key variable (key can also be used to merge with the raw data). • The variables were arranged in the same order as the paper instrument, with observations arranged according to their submission time.

    Backcheck data review: Results of the backcheck survey are compared against the originally captured survey results using the bcstats command in Stata. This function delivers a comparison of variables and identifies any discrepancies. Any discrepancies identified are then examined individually to determine if they are within reason.

    Data appraisal

    The following data quality checks were completed: • Daily SurveyCTO monitoring: This included outlier checks, skipped questions, a review of “Other, specify”, other text responses, and enumerator comments. Enumerator comments were used to suggest new response options or to highlight situations where existing options should be used instead. Monitoring also included a review of variable relationship logic checks and checks of the logic of answers. Finally, outliers in phone variables such as survey duration or the percentage of time audio was at a conversational level were monitored. A survey duration of close to 15 minutes and a conversation-level audio percentage of around 40% was considered normal. • Dashboard review: This included monitoring individual enumerator performance, such as the number of calls logged, duration of calls, percentage of calls responded to and percentage of non-consents. Non-consent reason rates and attempts per household were monitored as well. Duration analysis using R was used to monitor each module's duration and estimate the time required for subsequent rounds. The dashboard was also used to track overall survey completion and preview the results of key questions. • Daily Data Team reporting: The Field Supervisors and the Data Manager reported daily feedback on call progress, enumerator feedback on the survey, and any suggestions to improve the instrument, such as adding options to multiple choice questions or adjusting translations. • Audio audits: Audio recordings were captured during the consent portion of the interview for all completed interviews, for the enumerators' side of the conversation only. The recordings were reviewed for any surveys flagged by enumerators as having data quality concerns and for an additional random sample of 2% of respondents. A range of lengths were selected to observe edge cases. Most consent readings took around one minute, with some longer recordings due to questions on the survey or holding for the respondent. All reviewed audio recordings were completed satisfactorily. • Back-check survey: Field Supervisors made back-check calls to a random sample of 5% of the households that completed a survey in Round 1. Field Supervisors called these households and administered a short survey, including (i) identifying the same respondent; (ii) determining the respondent's position within the household; (iii) confirming that a member of the the data collection team had completed the interview; and (iv) a few questions from the original survey.

  9. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • openicpsr.org
    Updated May 18, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2024 [Dataset]. http://doi.org/10.3886/E103500V12
    Explore at:
    Dataset updated
    May 18, 2018
    Dataset provided by
    Princeton University
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1991 - 2024
    Area covered
    United States
    Description

    !!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 12 release notes:Adds .parquet file formatVersion 11 release notes:Adds 2023-2024 dataVersion 10 release notes:Adds 2022 dataVersion 9 release notes:Adds 2021 data.Version 8 release notes:Adds 2019 and 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last UCR hate crime data they release. Changes .rda file to .rds.Version 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD

Current Population Survey (CPS)

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description

analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

Search
Clear search
Close search
Google apps
Main menu