analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Recent studies have suggested that high levels of social support can encourage better health behaviours and result in improved cardiovascular health. In this study we evaluated the association between social support and ideal cardiovascular health among urban Jamaicans. We conducted a cross-sectional study among urban residents in Jamaica’s south-east health region. Socio-demographic data and information on cigarette smoking, physical activity, dietary practices, blood pressure, body size, cholesterol, and glucose, were collected by trained personnel. The outcome variable, ideal cardiovascular health, was defined as having optimal levels of ≥5 of these characteristics (ICH-5) according to the American Heart Association definitions. Social support exposure variables included number of friends (network size), number of friends willing to provide loans (instrumental support) and number of friends providing advice (informational support). Principal component analysis was used to create a social support score using these three variables. Survey-weighted logistic regression models were used to evaluate the association between ICH-5 and social support score. Analyses included 841 participants (279 males, 562 females) with mean age of 47.6 ± 18.42 years. ICH-5 prevalence was 26.6% (95%CI 22.3, 31.0) with no significant sex difference (male 27.5%, female 25.7%). In sex-specific, multivariable logistic regression models, social support score, was inversely associated with ICH-5 among males (OR 0.67 [95%CI 0.51, 0.89], p = 0.006) but directly associated among females (OR 1.26 [95%CI 1.04, 1.53], p = 0.020) after adjusting for age and community SES. Living in poorer communities was also significantly associated with higher odds of ICH-5 among males, while living communities with high property value was associated with higher odds of ICH among females. In this study, higher level of social support was associated with better cardiovascular health among women, but poorer cardiovascular health among men in urban Jamaica. Further research should explore these associations and identify appropriate interventions to promote cardiovascular health.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
Individual measurements of the Speed monitoring of the cantonal police Basel-Stadt from 2024 (time of the start of the measurement). Data are exclusively statistical surveys. These stand not in connection with administrative fines or criminal persecution. The statistical speed measurements are used by the cantonal police Basel-Stadt to check speed and road safety (e.g. security on pedestrian lanes) at the location in question. The Results are used to decide at which locations action is needed in There is a form of speed control. Each statistical device has a only point geometry and is usually provided with two directions (direction 1 and 2).Note: The Measurements are not necessarily representative for the whole year and must be taken in Context of the survey date. In addition, some Measurements during exceptional traffic management (e.g. diversion traffic as a result of construction site activities, etc.). Manipulation of devices can lead to incorrect measurements. To the The following data sets are available for speed monitoring:Single measurements from 2024 (this data set): https://data.bs.ch/explore/dataset/100097//individual measurements from 2021 to 2023: https://data.bs.ch/explore/dataset/100358/individual measurements until 2020: https://data.bs.ch/explore/dataset/100200/Key figures per measurement site: https://data.bs.ch/explore/dataset/100112/ Due to the large amount of data, the dataset may not be fully downloadable. If this problem occurs, you can download the complete data set and the individual measurements of the measuring stations here: https://data-bs.ch/stata/kapo/speed monitoring/all_data/speed monitoring_data.csvSingle measurements of measuring stations: https://data-bs.ch/stata/kapo/speed monitoring/data/The measurement locations are also published on the Geoportal Basel-Stadt: https://map.geo.bs.ch/?map_x=2614442➤_y=1267497➤_zoom=2 ⁇ =en&baselayer_ref=basemap%20coloured&tree_groups=speed&tree_group_layers_speed=RM_speed monitoring
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Despite rising cesarean section (CS) rates in Ethiopia, evidence on determinants of postoperative length of hospital stay (LoS) remains scarce, particularly for rural general hospitals handling most deliveries. This study was aimed at assessing the length of hospital stay and its associated factors among women who undergo cesarean section in general hospitals of the Sidama region. An institution-based cross-sectional study was conducted among 505 post-CS mothers from 1 January to 20 February 2024. A multistage sampling method was followed to select the study respondents. Data was collected using a structured and pretested, interviewer-administered questionnaire. Data was collected using the Kobo Toolbox system and exported to Stata version 14.0 for management and analysis. Factors associated with the length of hospital stay were determined using a Poisson regression model. The factors associated with the outcome variable were identified using the adjusted risk ratio (ARR). Statistical significance was set at a p-value of less than 0.05. The median LoS post-CS was 4 days (interquartile range: 3–4). Significant predictors of prolonged LoS included maternal age (ARR = 1.014, 95% CI: 1.004–1.024), neonatal intensive care unit (NICU) admission (ARR = 1.31, 95% CI: 1.16–1.46), surgical site infection (ARR = 2.39, 95% CI: 1.88–3.04), and low postoperative hemoglobin (ARR = 0.94, 95% CI: 0.92–0.97). The median hospital stay after cesarean delivery in general hospitals of Sidama region was 4 days. Prolonged stays were associated with maternal age, NICU admission, surgical site infection, and low post-op hemoglobin. Targeting high-risk mothers with enhanced monitoring and wound care—alongside NICU-maternity service integration and safety-conscious discharge protocols—is recommended to accelerate recovery.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the national health and nutrition examination survey (nhanes) with r nhanes is this fascinating survey where doctors and dentists accompany survey interviewers in a little mobile medical center that drives around the country. while the survey folks are interviewing people, the medical professionals administer laboratory tests and conduct a real doctor's examination. the b lood work and medical exam allow researchers like you and me to answer tough questions like, "how many people have diabetes but don't know they have diabetes?" conducting the lab tests and the physical isn't cheap, so a new nhanes data set becomes available once every two years and only includes about twelve thousand respondents. since the number of respondents is so small, analysts often pool multiple years of data together. the replication scripts below give a few different examples of how multiple years of data can be pooled with r. the survey gets conducted by the centers for disease control and prevention (cdc), and generalizes to the united states non-institutional, non-active duty military population. most of the data tables produced by the cdc include only a small number of variables, so importation with the foreign package's read.xport function is pretty straightforward. but that makes merging the appropriate data sets trickier, since it might not be clear what to pull for which variables. for every analysis, start with the table with 'demo' in the name -- this file includes basic demographics, weighting, and complex sample survey design variables. since it's quick to download the files directly from the cdc's ftp site, there's no massive ftp download automation script. this new github repository co ntains five scripts: 2009-2010 interview only - download and analyze.R download, import, save the demographics and health insurance files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the interview weights run a series of pretty generic analyses on the health insurance ques tions 2009-2010 interview plus laboratory - download and analyze.R download, import, save the demographics and cholesterol files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the mobile examination component (mec) weights perform a direct-method age-adjustment and matc h figure 1 of this cdc cholesterol brief replicate 2005-2008 pooled cdc oral examination figure.R download, import, save, pool, recode, create a survey object, run some basic analyses replicate figure 3 from this cdc oral health databrief - the whole barplot replicate cdc publications.R download, import, save, pool, merge, and recode the demographics file plus cholesterol laboratory, blood pressure questionnaire, and blood pressure laboratory files match the cdc's example sas and sudaan syntax file's output for descriptive means match the cdc's example sas and sudaan synta x file's output for descriptive proportions match the cdc's example sas and sudaan syntax file's output for descriptive percentiles replicate human exposure to chemicals report.R (user-contributed) download, import, save, pool, merge, and recode the demographics file plus urinary bisphenol a (bpa) laboratory files log-transform some of the columns to calculate the geometric means and quantiles match the 2007-2008 statistics shown on pdf page 21 of the cdc's fourth edition of the report click here to view these five scripts for more detail about the national health and nutrition examination survey (nhanes), visit: the cdc's nhanes homepage the national cancer institute's page of nhanes web tutorials notes: nhanes includes interview-only weights and interview + mobile examination component (mec) weights. if you o nly use questions from the basic interview in your analysis, use the interview-only weights (the sample size is a bit larger). i haven't really figured out a use for the interview-only weights -- nhanes draws most of its power from the combination of the interview and the mobile examination component variables. if you're only using variables from the interview, see if you can use a data set with a larger sample size like the current population (cps), national health interview survey (nhis), or medical expenditure panel survey (meps) instead. confidential to sas, spss, stata, sudaan users: why are you still riding around on a donkey after we've invented the internal combustion engine? time to transition to r. :D
Individual measurements of the Speed monitoring of the Basel-Stadt cantonal police from 2021 and 2022 (date of start of measurement). Among those depicted Data are exclusively statistical surveys. These stand not in connection with administrative fines or criminal persecution. The statistical speed measurements are used by the cantonal police Basel-Stadt to check speed and road safety (e.g. security on pedestrian lanes) at the location in question. The Results are used to decide at which locations action is needed in There is a form of speed control. Each statistical device has a only point geometry and is usually provided with two directions (direction 1 and 2).Note: The Measurements are not necessarily representative for the whole year and must be taken in Context of the survey date. In addition, some Measurements during exceptional traffic management (e.g. diversion traffic as a result of construction site activities, etc.). Manipulation of devices can lead to incorrect measurements. To the The following data sets are available for speed monitoring:Single measurements from 2023: https://data.bs.ch/explore/dataset/100097Single measurements from 2021 and 2022 (this dataset): https://data.bs.ch/explore/dataset/100358Single measurements until 2020: https://data.bs.ch/explore/dataset/100200Key figures per measurement site: https://data.bs.ch/explore/dataset/100112 Due to the large amount of data, the dataset may not be fully downloadable. If this problem occurs, you can download the complete data set and the individual measurements of the measuring stations here: https://data-bs.ch/stata/kapo/speed monitoring/all_data/speed monitoring_data.csvSingle measurements of measuring stations: https://data-bs.ch/stata/kapo/speed monitoring/data/The measurement locations are also published on the Geoportal Basel-Stadt: https://map.geo.bs.ch/s/speed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics. Version 3 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Arrests by Age, Sex, and Race (ASR) data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1974-2018 into a single file for each group of crimes. Each monthly file is only a single year as my laptop can't handle combining all the years together. These files are quite large and may take some time to load. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each age
https://borealisdata.ca/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.5683/SP3/9M3EZLhttps://borealisdata.ca/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.5683/SP3/9M3EZL
The Labour Force Survey provides estimates of employment and unemployment which are among the timeliest and important measures of performance of the Canadian economy. With the release of the survey results only 10 days after the completion of data collection, the LFS estimates are the first of the major monthly economic data series to be released. The Canadian Labour Force Survey was developed following the Second World War to satisfy a need for reliable and timely data on the labour market. Information was urgently required on the massive labour market changes involved in the transition from a war to a peace-time economy. The main objective of the LFS is to divide the working-age population into three mutually exclusive classifications - employed, unemployed, and not in the labour force - and to provide descriptive and explanatory data on each of these. LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, wage rates, union status, job permanency and workplace size are also produced. These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.This public use microdata file contains non-aggregated data for a wide variety of variables collected from the Labour Force Survey (LFS). It contains both personal characteristics for all individuals in the household and detailed labour force characteristics for household members 15 years of age and over. The personal characteristics include age, sex, marital status, educational attainment, and family characteristics. Detailed labour force characteristics include employment information such as class of worker, usual and actual hours of work, employee hourly and weekly wages, industry and occupation of current or most recent job, public and private sector, union status, paid or unpaid overtime hours, job permanency, hours of work lost, job tenure, and unemployment information such as duration of unemployment, methods of job search and type of job sought. Labour force characteristics are also available for students during the school year and during the summer months as well as school attendance whether full or part-time and the type of institution.LFS revisions: Labour force surveys are revised on a periodic basis, either to adopt the most recent geography, industry and occupation classifications; to use new observations to fine-tune seasonal adjustment factors; or to introduce methodological enhancement. Prior LFS revisions were conducted in 2011, 2015 and 2021. The most recent revisions to the LFS were conducted in 2023. The first major change was a transition to the National Occupational Classification (NOC) 2021 V1.0, with all LFS series from 1987 onwards having been revised to the new classification. The second major change were methodological enhancements to LFS data processing, applied to all LFS series beginning Jan 2006. The third major change was a revision of seasonal adjustment factors, applied to LFS series Jan 2002 onward. A list of prior versions of this LFS dataset can be found under the ‘Versions’ tab.
This version (V3) fixes a bug in Version 2 where 1993 data did not properly deal with missing values, leading to enormous counts of crime being reported. This is a collection of Offenses Known and Clearances By Arrest data from 1960 to 2016. The monthly zip files contain one data file per year(57 total, 1960-2016) as well as a codebook for each year. These files have been read into R using the ASCII and setup files from ICPSR (or from the FBI for 2016 data) using the package asciiSetupReader. The end of the zip folder's name says what data type (R, SPSS, SAS, Microsoft Excel CSV, feather, Stata) the data is in. Due to file size limits on open ICPSR, not all file types were included for all the data. The files are lightly cleaned. What this means specifically is that column names and value labels are standardized. In the original data column names were different between years (e.g. the December burglaries cleared column is "DEC_TOT_CLR_BRGLRY_TOT" in 1975 and "DEC_TOT_CLR_BURG_TOTAL" in 1977). The data here have standardized columns so you can compare between years and combine years together. The same thing is done for values inside of columns. For example, the state column gave state names in some years, abbreviations in others. For the code uses to clean and read the data, please see my GitHub file here. https://github.com/jacobkap/crime_data/blob/master/R_code/offenses_known.RThe zip files labeled "yearly" contain yearly data rather than monthly. These also contain far fewer descriptive columns about the agencies in an attempt to decrease file size. Each zip folder contains two files: a data file in whatever format you choose and a codebook. The data file is aggregated yearly and has already combined every year 1960-2016. For the code I used to do this, see here https://github.com/jacobkap/crime_data/blob/master/R_code/yearly_offenses_known.R.If you find any mistakes in the data or have any suggestions, please email me at jkkaplan6@gmail.comAs a description of what UCR Offenses Known and Clearances By Arrest data contains, the following is copied from ICPSR's 2015 page for the data.The Uniform Crime Reporting Program Data: Offenses Known and Clearances By Arrest dataset is a compilation of offenses reported to law enforcement agencies in the United States. Due to the vast number of categories of crime committed in the United States, the FBI has limited the type of crimes included in this compilation to those crimes which people are most likely to report to police and those crimes which occur frequently enough to be analyzed across time. Crimes included are criminal homicide, forcible rape, robbery, aggravated assault, burglary, larceny-theft, and motor vehicle theft. Much information about these crimes is provided in this dataset. The number of times an offense has been reported, the number of reported offenses that have been cleared by arrests, and the number of cleared offenses which involved offenders under the age of 18 are the major items of information collected.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the year of 2015, about 800.000 refugees arrived in Germany, a number which equals around one percent of the total population. This migration process was labelled the refugee crisis and was accompanied by a contested debate. On the one hand, there was a widespread willingness to voluntarily help arriving refugees, on the other hand, the number of xenophobic attacks against refugees drastically increased. Our paper will focus on a specific form of xenophobic violence with a strong symbolic meaning: We analyze how arson attacks against collective accommodation facilities spread. Using a comprehensive web chronicle, we collected temporal and spatial data about arson attacks perpetrated on accommodations or facilities for refugees in Germany between 2015 and 2017. We counted 251 attacks, assigned each incident location to its county, merged county characteristics such as population size, proportion of foreigners, right-wing party support, and—going beyond previous research—added geographically coded media data from two digital archives. Besides newspaper contents of a popular nation-wide tabloid, we use a data base that covers local fake news on refugees. Based on these data, we constructed a balanced panel data set with the counties as geographical units and periods of 14 days as the time dimension. Results indicate that social contagion drives the diffusion process of arson attacks. Spatial proximity of previous attacks increased the propensity of attacks in the neighboring counties. Attacks were more likely to occur in counties with larger populations and fewer foreigners. While local newspaper coverage did not impact the diffusion of xenophobic attacks, fake news were relevant–but only in East Germany. We also considered two particularly salient threatening events that received nation-wide media attention, namely Merkel’s “border opening” on the 5th of September 2015 and the sexual assaults occurring during New Year’s 2015/16 in Cologne. Both were followed by temporary increases in violence.
Individual measurements of the Speed monitoring of the cantonal police Basel-Stadt until 2020 (date of start of measurement). Among those depicted Data are exclusively statistical surveys. These stand not in connection with administrative fines or criminal persecution. The statistical speed measurements are used by the cantonal police Basel-Stadt to check speed and road safety (e.g. security on pedestrian lanes) at the location in question. The Results are used to decide at which locations action is needed in There is a form of speed control. Each statistical device has a only point geometry and is usually provided with two directions (direction 1 and 2).Note: The Measurements are not necessarily representative for the whole year and must be taken in Context of the survey date. In addition, some Measurements during exceptional traffic management (e.g. diversion traffic as a result of construction site activities, etc.). Manipulation of devices can lead to incorrect measurements. To the The following data sets are available for speed monitoring:Single measurements from 2023: https://data.bs.ch/explore/dataset/100097Single measurements from 2021 and 2022: https://data.bs.ch/explore/dataset/100358Single measurements until 2020 (this dataset): https://data.bs.ch/explore/dataset/100200Key figures per measurement site: https://data.bs.ch/explore/dataset/100112 Due to the large amount of data, the dataset may not be fully downloadable. If this problem occurs, you can download the complete data set and the individual measurements of the measuring stations here: https://data-bs.ch/stata/kapo/speed monitoring/all_data/speed monitoring_data.csvSingle measurements of measuring stations: https://data-bs.ch/stata/kapo/speed monitoring/data/The measurement locations are also published on the Geoportal Basel-Stadt: https://map.geo.bs.ch/s/speed
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description of the experiment setting: location, influential climatic conditions, controlled conditions (e.g. temperature, light cycle) In 1986, the Congress enacted Public Laws 99-500 and 99-591, requiring a biennial report on the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). In response to these requirements, FNS developed a prototype system that allowed for the routine acquisition of information on WIC participants from WIC State Agencies. Since 1992, State Agencies have provided electronic copies of these data to FNS on a biennial basis. FNS and the National WIC Association (formerly National Association of WIC Directors) agreed on a set of data elements for the transfer of information. In addition, FNS established a minimum standard dataset for reporting participation data. For each biennial reporting cycle, each State Agency is required to submit a participant-level dataset containing standardized information on persons enrolled at local agencies for the reference month of April. The 2016 Participant and Program Characteristics (PC2016) is the thirteenth data submission to be completed using the WIC PC reporting system. In April 2016, there were 90 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the American Virgin Islands, and 34 Indian tribal organizations. Processing methods and equipment used Specifications on formats (“Guidance for States Providing Participant Data”) were provided to all State agencies in January 2016. This guide specified 20 minimum dataset (MDS) elements and 11 supplemental dataset (SDS) elements to be reported on each WIC participant. Each State Agency was required to submit all 20 MDS items and any SDS items collected by the State agency. Study date(s) and duration The information for each participant was from the participants’ most current WIC certification as of April 2016. Due to management information constraints, Connecticut provided data for a month other than April 2016, specifically August 16 – September 15, 2016. Study spatial scale (size of replicates and spatial scale of study area) In April 2016, there were 90 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the American Virgin Islands, and 34 Indian tribal organizations. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) State Agency Data Submissions. PC2016 is a participant dataset consisting of 8,815,472 active records. The records, submitted to USDA by the State Agencies, comprise a census of all WIC enrollees, so there is no sampling involved in the collection of this data. PII Analytic Datasets. State agency files were combined to create a national census participant file of approximately 8.8 million records. The census dataset contains potentially personally identifiable information (PII) and is therefore not made available to the public. National Sample Dataset. The public use SAS analytic dataset made available to the public has been constructed from a nationally representative sample drawn from the census of WIC participants, selected by participant category. The nationally representative sample is composed of 60,003 records. The distribution by category is 5,449 pregnant women, 4,661 breastfeeding women, 3,904 postpartum women, 13,999 infants, and 31,990 children. Level of subsampling (number and repeat or within-replicate sampling) The proportionate (or self-weighting) sample was drawn by WIC participant category: pregnant women, breastfeeding women, postpartum women, infants, and children. In this type of sample design, each WIC participant has the same probability of selection across all strata. Sampling weights are not needed when the data are analyzed. In a proportionate stratified sample, the largest stratum accounts for the highest percentage of the analytic sample. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains all MDS and SDS information submitted by the State agency on the sampled WIC participant. In addition, the file contains constructed variables used for analytic purposes. To protect individual privacy, the public use file does not include State agency, local agency, or case identification numbers. Description of any gaps in the data or other limiting factors Due to management information constraints, Connecticut provided data for a month other than April 2016, specifically August 16 – September 15, 2016. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: WIC Participant and Program Characteristics 2016. File Name: wicpc_2016_public.csvResource Description: The 2016 Participant and Program Characteristics (PC2016) is the thirteenth data submission to be completed using the WIC PC reporting system. In April 2016, there were 90 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the American Virgin Islands, and 34 Indian tribal organizations.Resource Software Recommended: SAS, version 9.4,url: https://www.sas.com/en_us/software/sas9.html Resource Title: WIC Participant and Program Characteristics 2016 Codebook. File Name: WICPC2016_PUBLIC_CODEBOOK.xlsxResource Software Recommended: SAS, version 9.4,url: https://www.sas.com/en_us/software/sas9.html Resource Title: WIC Participant and Program Characteristics 2016 - Zip File with SAS, SPSS and STATA data. File Name: WIC_PC_2016_SAS_SPSS_STATA_Files.zipResource Description: WIC Participant and Program Characteristics 2016 - Zip File with SAS, SPSS and STATA data
After the accomplishment of the Nepal Living Standards Survey, 1995/96, the Central Bureau of Statistics has given importance to the follow up surveys relating to household consumption. Two basic reasons stand behind such a policy. The first is to understand the behavior on consumption, facilitating the assessment of poverty levels. The second reason is to support the estimation of national aggregates of consumption required for the national accounting. It is on that line this consumption survey for rural Nepal has been attempted solely on the Government resources. Sustaining foreign aid supported projects in the long run through capacity building is the aim behind such follow up surveys.
The survey was planned in January 2000 and was launched in the later part of the same year. Hopefully, the survey results will provide some ways of linking the gap likely to emerge from the earlier and the next round of the Nepal Living Standards Survey now in the initial preparatory phase and scheduled for 2002/03. The survey followed the similar methodology as used in the Nepal Labour Force Survey, 1998/99. As a follow up survey, the sample size has been kept at a moderately low level of 1,968 households. The 1991 Population Census of Nepal was used as a frame for sampling. The sampling was done in such a way that the results are valid nationally for the rural areas.
The basic objectives of this survey were 1. To determine the pattern of household consumption and expenditure on food, non-food, housing, durable goods and own account production of goods and services for rural Nepal, and 2. To provide information required in the estimation of National Accounts aggregates.
Content of the survey 1. General information, 2. Housing expenditures, 3. Food expenditures (including home production), 4. Non-food expenditures and inventory of durable goods, 5. Non-food expenditures (own account production of goods and services), and 6. Income
Rural areas of Nepal Region Ecological belt
Household
The survey covered the whole rural areas of the country and no geographical areas were excluded. All usual residents of rural Nepal were considered eligible for inclusion in the survey but households of diplomatic missions were excluded. As is normal in household surveys, homeless and those people living for six months or more away from the household or in institutions such as school hostels, police barracks, army camps and hospitals were also excluded.
Sample survey data [ssd]
Sample Design The aim of the HCSRN is to determine the consumption pattern of rural households of the country. This is not a baseline survey but is among the first in attempting to set the trend in consumption pattern of rural households with respect to time. It is envisaged that this type of small surveys are done in between the big surveys conducted specifically to measure the level of poverty in the country.
A two-stage sample selection procedure was adopted in the survey. The Primary Sampling Unit (PSU) consisted of a ward or in some cases a sub-ward or an amalgamation of small wards. PSUs were selected with Probability Proportional to Size (PPS) sampling, with the number of households available from the 1991 Population Census as a measure of size. Within the selected PSU, all households were listed in the field and 12 households were selected by systematic sampling with random start. Using PPS sampling at the first stage, 165 PSUs were selected and in the second stage, using systematic sampling 12 households were selected from each PSU. In the process, a total of 1980 households were selected from the rural areas of the country.
Sampling Frame The 1991 Population Census of Nepal provided a base for building a sampling frame for the survey. The frame consisted of the list of wards along with the census count of the number of households in each ward. Because of the increase in the number of urban areas (municipalities) and the decrease of rural wards after the 1991 Population Census, the frame required certain modifications. The 33 municipalities at the time of census had been increased to a total of 58 municipalities. All rural wards converted into urban areas had to be removed from the earlier frame. In a number of cases new municipalities were created by combining together a large number of what were formerly wards in rural VDCs. Hence, the rural areas had a number of wards reduced from their earlier list. Fortunately, an exercise in modifying the rural frame had been already done for the purpose the last Nepal Labour Force Survey, 1998/99. Therefore, the same modified sampling frame of the NLFS was found most appropriate to be used for this survey as well.
Sample Size The sample size was determined on the basis of experiences gained from the previous surveys notably the NLSS and NLFS and the resources available for the survey. The survey obviously had to fix its sample size according to what the available resource could afford to accomplish. The sample size was fixed at 1,980 households.
Since the final "take" was to be 12 households per PSU, it was essential that a selected PSU contained a multiple of 12 households.
Face-to-face [f2f]
The Household Survey Section of CBS developed an initial questionnaire of HCSRN on the basis of the questionnaires used in the NLSS and the Multi-Purpose Household Budget Survey (conducted by the Nepal Rastra Bank). The draft questionnaire was subsequently modified through experience gained from pre-tests. The pre-test was carried out in the rural areas of 12 districts.
Household Questionnaire
The questionnaire contained six sections. The contents of the questionnaire are as follows:
Section 1. General Information
The main purposes of this section were: (i) to identify the member of household, (ii) to provide basic demographic information such as sex, age and marital status, and (iii) to collect information on literacy.
Section 2. Housing
This section collected information on household's expenditure on housing, utilities and amenities (ownership, rent and expenditure on water, electricity, telephone, cooking fuels, etc.)
Section 3. Food Expenses and Home Production
This section collected information on food expenditure of the household including consumption of food items that the household produced.
Section 4. Non-food Expenditures and Inventory of Durable Goods This section collected information on expenditure on non-food items (fuels, clothing and personal care, etc.)
Section 5. Non-food Expenditures (Own Account Production of Goods and Services) This section collected information on own account production of goods and services (which included making of baskets, fetching water and collecting firewood, etc.)
Section 6. Income
This section collected information on income from different sources as well as information on loans and savings.
Completed questionnaires from the field were brought to the central office (Kathmandu) for data editing. For quality control, range and consistency checks as well as scrutiny were performed during the data entry period.
In one of the PSUs selected for the survey, enumeration work could not be carried due to unavoidable reasons. The total number of households successfully interviewed was, thus, reduced to 1,968 among 1,980. The response rate of this survey is hence 99.4%.
The sampling errors of key aggregates measured in this survey are provided in the Technical Documents. These sampling errors are calculated by means of the STATA 5.0 package that was used for processing this survey result. Sample design and sample size are the main factors that influence the size of the sampling error. In the case of total per capita consumption in rural Nepal, which is Rs.11, 928, the 95 percent lower and upper bounds for the estimate, are Rs.11, 605 and Rs.12, 251 respectively. This means that we are 95 percent confident that the average per capita consumption of rural part of Nepal lies within this range.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 15 release notes:Adds 2021 data.Version 14 release notes:Adds 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Arrests by Age, Sex, and Race data they release. Version 13 release notes:Changes R files from .rda to .rds.Fixes bug where the number_of_months_reported variable incorrectly was the largest of the number of months reported for a specific crime variable. For example, if theft was reported Jan-June and robbery was reported July-December in an agency, in total there were 12 months reported. But since each crime (and let's assume no other crime was reported more than 6 months of the year) only was reported 6 months, the number_of_months_reported variable was incorrectly set at 6 months. Now it is the total number of months reported of any crime. So it would be set to 12 months in this example. Thank you to Nick Eubank for alerting me to this issue.Adds rows even when a agency reported zero arrests that month; all arrest values are set to zero for these rows.Version 12 release notes:Adds 2019 data.Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "p
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441876https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441876
Abstract (en): This collection contains an array of economic time series data pertaining to the United States, the United Kingdom, Germany, and France, primarily between the 1920s and the 1960s, and including some time series from the 18th and 19th centuries. These data were collected by the National Bureau of Economic Research (NBER), and they constitute a research resource of importance to economists as well as to political scientists, sociologists, and historians. Under a grant from the National Science Foundation, ICPSR and the National Bureau of Economic Research converted this collection (which existed heretofore only on handwritten sheets stored in New York) into fully accessible, readily usable, and completely documented machine-readable form. The NBER collection -- containing an estimated 1.6 million entries -- is divided into 16 major categories: (1) construction, (2) prices, (3) security markets, (4) foreign trade, (5) income and employment, (6) financial status of business, (7) volume of transactions, (8) government finance, (9) distribution of commodities, (10) savings and investments, (11) transportation and public utilities, (12) stocks of commodities, (13) interest rates, and (14) indices of leading, coincident, and lagging indicators, (15) money and banking, and (16) production of commodities. Data from all categories are available in Parts 1-22. The economic variables are usually observations on the entire nation or large subsets of the nation. Frequently, however, and especially in the United States, separate regional and metropolitan data are included in other variables. This makes cross-sectional analysis possible in many cases. The time span of variables in these files may be as short as one year or as long as 160 years. Most data pertain to the first half of the 20th century. Many series, however, extend into the 19th century, and a few reach into the 18th. The oldest series, covering brick production in England and Wales, begins in 1785, and the most recent United States data extend to 1968. The unit of analysis is an interval of time -- a year, a quarter, or a month. The bulk of observations are monthly, and most series of monthly data contain annual values or totals. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. Time series of economic statistics pertaining to France, Germany, the United Kingdom, and the United States between 1785 and 1968. 2007-03-26 This study, updated from OSIRIS, now includes SAS, SPSS, and Stata setup files, SAS transport (XPORT) files, SPSS portable files, a Stata system files, and an updated codebook. Funding insitution(s): National Science Foundation. The data were collected between the 1920s and the 1970s, but it is unclear from the documentation as to the exact start and end dates.
Background: Adolescent girls in Kenya are disproportionately affected by early and unintended pregnancies, unsafe abortion and HIV infection. The In Their Hands (ITH) programme in Kenya aims to increase adolescents' use of high-quality sexual and reproductive health (SRH) services through targeted interventions. ITH Programme aims to promote use of contraception and testing for sexually transmitted infections (STIs) including HIV or pregnancy, for sexually active adolescent girls, 2) provide information, products and services on the adolescent girl's terms; and 3) promote communities support for girls and boys to access SRH services.
Objectives: The objectives of the evaluation are to assess: a) to what extent and how the new Adolescent Reproductive Health (ARH) partnership model and integrated system of delivery is working to meet its intended objectives and the needs of adolescents; b) adolescent user experiences across key quality dimensions and outcomes; c) how ITH programme has influenced adolescent voice, decision-making autonomy, power dynamics and provider accountability; d) how community support for adolescent reproductive and sexual health initiatives has changed as a result of this programme.
Methodology ITH programme is being implemented in two phases, a formative planning and experimentation in the first year from April 2017 to March 2018, and a national roll out and implementation from April 2018 to March 2020. This second phase is informed by an Annual Programme Review and thorough benchmarking and assessment which informed critical changes to performance and capacity so that ITH is fit for scale. It is expected that ITH will cover approximately 250,000 adolescent girls aged 15-19 in Kenya by April 2020. The programme is implemented by a consortium of Marie Stopes Kenya (MSK), Well Told Story, and Triggerise. ITH's key implementation strategies seek to increase adolescent motivation for service use, create a user-defined ecosystem and platform to provide girls with a network of accessible subsidized and discreet SRH services; and launch and sustain a national discourse campaign around adolescent sexuality and rights. The 3-year study will employ a mixed-methods approach with multiple data sources including secondary data, and qualitative and quantitative primary data with various stakeholders to explore their perceptions and attitudes towards adolescents SRH services. Quantitative data analysis will be done using STATA to provide descriptive statistics and statistical associations / correlations on key variables. All qualitative data will be analyzed using NVIVO software.
Study Duration: 36 months - between 2018 and 2020.
Narok and Homabay counties
Households
All adolescent girls aged 15-19 years resident in the household.
The sampling of adolescents for the household survey was based on expected changes in adolescent's intention to use contraception in future. According to the Kenya Demographic and Health Survey 2014, 23.8% of adolescents and young women reported not intending to use contraception in future. This was used as a baseline proportion for the intervention as it aimed to increase demand and reduce the proportion of sexually active adolescents who did not intend to use contraception in the future. Assuming that the project was to achieve an impact of at least 2.4 percentage points in the intervention counties (i.e. a reduction by 10%), a design effect of 1.5 and a non- response rate of 10%, a sample size of 1885 was estimated using Cochran's sample size formula for categorical data was adequate to detect this difference between baseline and end line time points. Based on data from the 2009 Kenya census, there were approximately 0.46 adolescents girls per a household, which meant that the study was to include approximately 4876 households from the two counties at both baseline and end line surveys.
We collected data among a representative sample of adolescent girls living in both urban and rural ITH areas to understand adolescents' access to information, use of SRH services and SRH-related decision making autonomy before the implementation of the intervention. Depending on the number of ITH health facilities in the two study counties, Homa Bay and Narok that, we sampled 3 sub-Counties in Homa Bay: West Kasipul, Ndhiwa and Kasipul; and 3 sub-Counties in Narok, Narok Town, Narok South and Narok East purposively. In each of the ITH intervention counties, there were sub-counties that had been prioritized for the project and our data collection focused on these sub-counties selected for intervention. A stratified sampling procedure was used to select wards with in the sub-counties and villages from the wards. Then households were selected from each village after all households in the villages were listed. The purposive selection of sub-counties closer to ITH intervention facilities meant that urban and semi-urban areas were oversampled due to the concentration of health facilities in urban areas.
Qualitative Sampling
Focus Group Discussion participants were recruited from the villages where the ITH adolescent household survey was conducted in both counties. A convenience sample of consenting adults living in the villages were invited to participate in the FGDS. The discussion was conducted in local languages. A facilitator and note-taker trained on how to use the focus group guide, how to facilitate the group to elicit the information sought, and how to take detailed notes. All focus group discussions took place in the local language and were tape-recorded, and the consent process included permission to tape-record the session. Participants were identified only by their first names and participants were asked not to share what was discussed outside of the focus group. Participants were read an informed consent form and asked to give written consent. In-depth interviews were conducted with purposively selected sample of consenting adolescent girls who participated in the adolescent survey. We conducted a total of 45 In-depth interviews with adolescent girls (20 in Homa Bay County and 25 in Narok County respectively). In addition, 8 FGDs (4 each per county) were conducted with mothers of adolescent girls who are usual residents of the villages which had been identified for the interviews and another 4 FGDs (2 each per county) with CHVs.
N/A
Face-to-face [f2f] for quantitative data collection and Focus Group Discussions and In Depth Interviews for qualitative data collection
The questionnaire covered; socio-demographic and household information, SRH knowledge and sources of information, sexual activity and relationships, family planning knowledge, access, choice and use when needed, exposure to family planning messages and voice and decision making autonomy and quality of care for those who visited health facilities in the 12 months before the survey. The questionnaire was piloted before the data collection and the questions reviewed for appropriateness, comprehension and flow. The questionnaire was piloted among a sample of 42 adolescent girls (two each per field interviewer) 15-19 from a community outside the study counties.
The questionnaire was originally developed in English and later translated into Kiswahili. The questionnaire was programmed using ODK-based Survey CTO platform for data collection and management and was administered through face-to-face interview.
The survey tools were programmed using the ODK-based SurveyCTO platform for data collection and management. During programming, consistency checks were in-built into the data capture software which ensured that there were no cases of missing or implausible information/values entered into the database by the field interviewers. For example, the application included controls for variables ranges, skip patterns, duplicated individuals, and intra- and inter-module consistency checks. This reduced or eliminated errors usually introduced at the data capture stage. Once programmed, the survey tools were tested by the programming team who in conjunction with the project team conducted further testing on the application's usability, in-built consistency checks (skips, variable ranges, duplicating individuals etc.), and inter-module consistency checks. Any issues raised were documented and tracked on the Issue Tracker and followed up to full and timely resolution. After internal testing was done, the tools were availed to the project and field teams to perform user acceptance testing (UAT) so as to verify and validate that the electronic platform worked exactly as expected, in terms of usability, questions design, checks and skips etc.
Data cleaning was performed to ensure that data were free of errors and that indicators generated from these data were accurate and consistent. This process begun on the first day of data collection as the first records were uploaded into the database. The data manager used data collected during pilot testing to begin writing scripts in Stata 14 to check the variables in the data in 'real-time'. This ensured the resolutions of any inconsistencies that could be addressed by the data collection teams during the fieldwork activities. The Stata 14 scripts that perform real-time checks and clean data also wrote to a .rtf file that detailed every check performed against each variable, any inconsistencies encountered, and all steps that were taken to address these inconsistencies. The .rtf files also reported when a variable was
The Integrated Household Survey is one of the primary instruments implemented by the Government of Malawi through the National Statistical Office (NSO) roughly every 3-5 years to monitor and evaluate the changing conditions of Malawian households. The IHS data have, among other insights, provided benchmark poverty and vulnerability indicators to foster evidence-based policy formulation and monitor the progress of meeting the Millennium Development Goals (MDGs), the goals listed as part of the Malawi Growth and Development Strategy (MGDS) and now the Sustainable Development Goals (SDGs).
National coverage
Members of the following households are not eligible for inclusion in the survey: • All people who live outside the selected EAs, whether in urban or rural areas. • All residents of dwellings other than private dwellings, such as prisons, hospitals and army barracks. • Members of the Malawian armed forces who reside within a military base. (If such individuals reside in private dwellings off the base, however, they should be included among the households eligible for random selection for the survey.) • Non-Malawian diplomats, diplomatic staff, and members of their households. (However, note that non-Malawian residents who are not diplomats or diplomatic staff and are resident in private dwellings are eligible for inclusion in the survey. The survey is not restricted to Malawian citizens alone.) • Non-Malawian tourists and others on vacation in Malawi.
Sample survey data [ssd]
The IHS5 sampling frame is based on the listing information and cartography from the 2018 Malawi Population and Housing Census (PHC); includes the three major regions of Malawi, namely North, Center and South; and is stratified into rural and urban strata. The urban strata include the four major urban areas: Lilongwe City, Blantyre City, Mzuzu City, and the Municipality of Zomba. All other areas are considered as rural areas, and each of the 27 districts were considered as a separate sub-stratum as part of the main rural stratum. The sampling frame further excludes the population living in institutions, such as hospitals, prisons and military barracks. Hence, the IHS5 strata are composed of 32 districts in Malawi.
A stratified two-stage sample design was used for the IHS5.
Note: Detailed sample design information is presented in the "Fifth Integrated Household Survey 2019-2020, Basic Information Document" document.
Computer Assisted Personal Interview [capi]
HOUSEHOLD QUESTIONNAIRE The Household Questionnaire is a multi-topic survey instrument and is near-identical to the content and organization of the IHS3 and IHS4 questionnaires. It encompasses economic activities, demographics, welfare and other sectoral information of households. It covers a wide range of topics, dealing with the dynamics of poverty (consumption, cash and non-cash income, savings, assets, food security, health and education, vulnerability and social protection). Although the IHS5 household questionnaire covers a wide variety of topics in detail it intentionally excludes in-depth information on topics covered in other surveys that are part of the NSO’s statistical plan (such as maternal and child health issues covered at length in the Malawi Demographic and Health Survey).
AGRICULTURE QUESTIONNAIRE All IHS5 households that are identified as being involved in agricultural or livestock activities were administered the agriculture questionnaire, which is primarily modelled after the IHS3 counterpart. The modules are expanding on the agricultural content of the IHS4, IHS3, IHS2, AISS, and other regional agricultural surveys, while remaining consistent with the NACAL topical coverage and methodology. The development of the agriculture questionnaire was done with input from the aforementioned stakeholders who provided input on the household questionnaire as well as outside researchers involved in research and policy discussions pertaining to the Malawian agriculture. The agriculture questionnaire allows, among other things, for extensive agricultural productivity analysis through the diligent estimation of land areas, both owned and cultivated, labor and non-labor input use and expenditures, and production figures for main crops, and livestock. Although one of the major foci of the agriculture data collection effort was to produce smallholder production estimates for major crops, it is also possible to disaggregate the data by gender and main geographical regions. The IHS5 cross-sectional households supply information on the last completed rainy season (2017/2018 or 2018/2019) and the last completed dry season (2018 or 2019) depending on the timing of their interview.
FISHERIES QUESTIONNAIRE The design of the IHS5 fishery questionnaire is identical to the questionnaire designed for IHS3. The IHS3 fisheries questionnaire was informed by the design and piloting of a fishery questionnaire by the World Fish Center (WFC), which was supported by the LSMS-ISA project for the purpose of assembling a fishery questionnaire that could be integrated into multi-topic household-surveys. The WFC piloted the draft instrument in November 2009 in the Lower Shire region, and the NSO team considered the revised draft in designing the IHS5 fishery questionnaire.
COMMUNITY QUESTIONNAIRE The content of the IHS5 Community Questionnaire follows the content of the IHS3 & IHS4 Community Questionnaires. A “community” is defined as the village or urban location surrounding the enumeration area selected for inclusion in the sample and which most residents recognize as being their community. The IHS5 community questionnaire was administered to each community associated with the cross-sectional EAs interviewed. Identical to the IHS3 and IHS4 approach, to a group of several knowledgeable residents such as the village headman, the headmaster of the local school, the agricultural field assistant, religious leaders, local merchants, health workers and long-term knowledgeable residents. The instrument gathers information on a range of community characteristics, including religious and ethnic background, physical infrastructure, access to public services, economic activities, communal resource management, organization and governance, investment projects, and local retail price information for essential goods and services.
MARKET QUESTIONNAIRE The Market Survey consisted of one questionnaire which is composed of four modules. Module A: Market Identification, Module B: Seasonal Main Crops, Module C: Permanents Crops, and Module D: Food Consumption.
DATA ENTRY PLATFORM To ensure data quality and timely availability of data, the IHS5 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHS5, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. In Survey Solutions, Headquarters can then see the location of the dwellings plotted on a map of Malawi to better enable supervision from afar – checking both the number of interviews performed and the fact that the sample households lie within EA boundaries. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
The range and consistency checks built into the application was informed by the LSMS-ISA experience in previous IHS waves. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (NSO management) assigned work to supervisors based on their regions of coverage. Supervisors then made assignments to the enumerators linked to their Supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHS5 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to STATA for other consistency checks, data cleaning, and analysis.
DATA MANAGEMENT The IHS5 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHS5 Interviews were collected in “sample” mode (assignments generated from headquarters) as opposed to “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample.
The range and consistency checks built into the application was informed by the LSMS-ISA experience in previous IHS waves. Prior programming of the data
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de447173https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de447173
Abstract (en): This round of Eurobarometer surveys queried respondents on standard Eurobarometer measures, such as how satisfied they were with their present life, whether they attempted to persuade others close to them to share their views on subjects they held strong opinions about, whether they discussed political matters, what their expectations were for the next 12 months, and how they viewed economic and social issues in their country compared to the European Union (EU). Additional questions focused on the respondents' knowledge of and opinions on the EU, including how well-informed they felt about the it, what sources of information about the EU they used, whether their country had benefited from being an EU member (or would benefit from being a future member), and the extent of their personal interest in EU matters. Another major focus of the surveys was personal data privacy. The survey asked respondents about their knowledge of the rules and requirements in protecting personal data, the ability of the law to protect citizens from entities accessing their information, and whether law enforcement should be able to access personal information for the purpose of fighting crime and terrorism. For the second major focus of the survey, the national economy, respondents were asked to evaluate their personal financial situation and their nation's economy, as well as to estimate the official growth rate (Gross Domestic Product), inflation rate, and unemployment rate, and then to compare these rates to those from previous or future years. Respondents also provided their opinion about the use of statistical information, especially for political decision-making. As a final major focus, respondents were asked about their interest in scientific research including how the media presents information about scientific research and what types of media they access to get information about this topic. Additional questions were asked of respondents in regard to globalization and involvement of the EU in this process, the 50th anniversary of EU achievements, the development of environmental, foreign, and immigration policies, and the European Council presidency. Demographic and other background information includes respondent's age, gender, nationality, origin of birth (personal and parental), marital status, left-to-right political self-placement, occupation, age when stopped full-time education, household composition, ownership of a fixed or a mobile telephone and other durable goods, type and size of locality, region of residence, and language of interview (select countries). Please review the "Weighting Information" section of the ICPSR codebook for this Eurobarometer study. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Citizens of the EU aged 15 and over residing in the 27 EU member countries: Austria, Belgium, Bulgaria, Republic of Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, and the United Kingdom, plus the citizens in the two EU candidate countries: Croatia and Turkey, and the citizens in the Turkish Cypriot Community and the Former Yugoslav Republic of Macedonia Smallest Geographic Unit: country Multistage national probability samples. 2010-06-29 The data have been further processed by GESIS. The SPSS, SAS, and Stata setup files, SPSS and Stata system files, SAS transport (CPORT) file, tab-delimited ASCII data file, and codebook have been updated.2008-02-20 Data for all previously-embargoed variables are now available. This collection now contains data for the Former Yugoslav Republic of Macedonia (FYROM), and the addition of seven variables. The data have been further processed by the ZA. The codebook, SPSS, SAS and Stata setup files, SPSS and Stata system files, a SAS transport (CPORT) file, and a tab-delimited ASCII data file have been updated. face-to-face interviewThe original data collection was carried out by TNS Opinion and Social on request of the European Commission.T...
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456828https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456828
Abstract (en): The purpose of the Health Interview Survey is to obtain information about the amount and distribution of illness, its effects in terms of disability and chronic impairments, and the kinds of health services people receive. There are five types of records in this core survey, each in a separate data file. The variables in the Household File (Part 1) include type of living quarters, size of family, number of families in the household, presence of a telephone, number of unrelated individuals, and region. The Person File (Part 2) includes information on sex, age, race, marital status, Hispanic origin, education, veteran status, family income, family size, major activities, health status, activity limits, employment status, and industry and occupation. These variables are found in the Condition, Doctor Visit, and Hospital Episode Files as well. The Person File also supplies data on height, weight, bed days, doctor visits, hospital stays, years at residence, and region variables. The Condition File (Part 3) contains information for each reported health condition, with specifics on injury and accident reports. The Hospital Episode File (Part 4) provides information on medical conditions, hospital episodes, type of service, type of hospital ownership, date of admission and discharge, number of nights in hospital, and operations performed. The Doctor Visit File (Part 5) documents doctor visits within the time period and identifies acute or chronic conditions. A sixth file has been added, along with the five core files. The Health Insurance File (Part 6) documents basic demographic information along with medical coverage and health insurance plans, as well as differentiates between hospital, doctor visit, and surgical insurance coverage. Civilian, noninstitutionalized population of the United States. A multistage probability sample was used in selecting housing units. 2010-09-30 Frequencies and variable labels that were previously incorrect have been corrected.2010-09-09 A technical error has been found and resolved in the processing procedure, in which defined file sets did not match subsequent data sets.2010-09-02 SAS, SPSS, and Stata setup files have been added. Some corresponding documentation has been updated and pre-existing data files have been replaced. A sixth dataset has been added in place of the National Health Survey Procedure Documentation, which can now be found with all other corresponding and added documentation.2006-01-18 File CB8337.PDF was removed from any previous datasets and flagged as a study-level file, so that it will accompany all downloads. face-to-face interviewThese data files contain weights that must be used in any analysis.Per agreement with NCHS, ICPSR distributes the data files and text of the technical documentation for this collection as prepared by NCHS.
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D