46 datasets found

d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
The Canada Trademarks Dataset
zenodo.org
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4999655
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeremy Sheff; Jeremy Sheff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Canada Trademarks Dataset

18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

Python and Stata Scripts (c) 2021 Jeremy Sheff

Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

Terms of Use:

As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

Details of Repository Contents:

This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

/csv: contains the .csv versions of the data files

/do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset

/dta: contains the .dta versions of the data files

/py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
r
Dataset on communal popular initiatives in Germany for the year 2018
researchdata.se
su.figshare.com
Updated May 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christophe Premat (2022). Dataset on communal popular initiatives in Germany for the year 2018 [Dataset]. http://doi.org/10.17045/STHLMUNI.9601187
Explore at:
Unique identifier
https://doi.org/10.17045/STHLMUNI.9601187
Dataset updated
May 4, 2022
Dataset provided by
Stockholm University
Authors
Christophe Premat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Germany
Description
The dataset is created for the year 2018 in Germany for 267 popular initiatives at the local level where the information was collected regarding the following variables: topic of initiative, State legislation, turnout, validity of the initiative, success of the popular initiative, proportion of yes voters in case of a referendum, size of municipality, profile of initiators (local council or citizens), proportion of no voters, index of mobilization, repartition index (difference between yes and no voters), approval rate, number of inhabitants, correction of initiative, status of the case (open/closed).

The dataset was created with the help of the existing popular initiatives registered by the University of Wuppertal and the association Mehr Demokratie in Germany. The idea of the dataset is to evaluate in details on which factors the success of popular initiatives depend in the different German States (Länder). A repartition index (difference between yes and no voters) and a mobilization index (repartition index multiplied by the turnout) were calculated and added in the dataset. All the other variables were also created in order to balance the result of these initiatives. The final aim is to be able to measure how direct democratic tools influence local politics in Germany. This is why it is important to examine the prevailing factors for the satisfaction of citizens who use these procedures. In this dataset, the destiny of an initiative (failure/success) can be taken as the dependent variable and all the others could be classified as independent variables.Direct democracy offers possibilities for citizens to influence political decisions especially at the local level. In Germany, the local political systems have been affected by the introduction of direct democratic tools such as citizen initiatives and local referenda since the Reunification. The State legislations defined new conditions for citizen initiatives and municipal referenda with a minimum number of valid signatures for initiatives and a minimum approval rate for the referenda. In the attached file, you will find the dataset in Excel file as well as a .dta file that you can open with the software Stata (https://www.stata.com/).
z
Galvanising the Open Access Community: A Study on the Impact of Plan S -...
zenodo.org
bin, csv
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
W. Benedikt Schmal; W. Benedikt Schmal (2024). Galvanising the Open Access Community: A Study on the Impact of Plan S - Data and Code [Dataset]. http://doi.org/10.5281/zenodo.12523229
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12523229
Dataset updated
Oct 15, 2024
Dataset provided by
Scidecode
Authors
W. Benedikt Schmal; W. Benedikt Schmal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the datasets and code underpinning Chapter 3 "Counterfactual Impact Evaluation of Plan S" of the report "Galvanising the Open Access Community: A Study on the Impact of Plan S" commissioned by the cOAlition S to scidecode science consulting.

Two categories of files are part of this repository:

1. Datasets

The 21 CSV source files contain the subsets of publications funded by the funding agencies that are part of this study. These files have been provided by OA.Works, with whom scidecode has collaborated for the data collection process. Data sources and collection and processing workflows applied by OA.Works are described on their website and specifically at https://about.oa.report/docs/data.

The file "plan_s.dta" is the aggregated data file stored in the format ".dta", which can be accessed with STATA by default or with plenty of programming languages using the respective packages, e.g., R or Python.

2. Code files

The associated code files that have been used to process the data files are:

- data_prep_and_analysis_script.do
- coef_plots_script.R

The first file has been used to process the CSV data files above for data preparation and analysis purposes. Here, data aggregation and data preprocessing is executed. Furthermore, all statistical regressions for the ounterfactual impact evaluation are listed in this code file. The second code file "coef_plots_script.R" uses the computed results of the counterfactual impact evaluation to create the final graphic plots using the ggplot2 package.

The first ".do" file has to be run in STATA, the second one (".R") requires the use of an integrated development environment for R.
Further Information are avilable in the final report and via the followng URLs:

https://www.coalition-s.org/ https://scidecode.com/ https://oa.works/ https://openalex.org/
https://sites.google.com/view/wbschmal
H
Survey of Consumer Finances (SCF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Survey of Consumer Finances (SCF) [Dataset]. http://doi.org/10.7910/DVN/FRMKMF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FRMKMF
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
g
Mexican Wealth Distribution 1810-1910
gimi9.com
researchdata.se
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Mexican Wealth Distribution 1810-1910 [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-57804-q8sr-qz06
Explore at:
Dataset updated
Dec 2, 2023
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The zip files contain several files with wills from Mexico between 1810 and 1910 collected in order to measure Mexican wealth distribution in its first century of independence. The main file is wills_clean.xlsx, which contains the full collection of wills; in that file, you will find variables for year, state, and wealth, not excluding debts, debts and wealth (net wealth). You can combine this file with the do file cleaningroutine_for_social_tables to produce the detailed social tables. The rest of the files consist of data files with the social tables (for comparison) and xlsx files with the wills from the main file divided by decade to facilitate calculations using the do file inequality_analysis_ routine_clean.do from which you will be able to reproduce the rest of the analysis (unbalanced sample and generalized beta, lognormal, etc.) Note: The calculation programs are .do files; thus, they require stata to be executed. Some of the detailed social tables are dta files, and thus also stata files. You can open them in R and work with them or convert them to any other data format. The wills come from 5 different Mexican archives: Archivo Histórico de Notarias de la Ciudad de México, Archivo General del Estado de Yucatán, Archivo Municipal de Saltillo, Archivo Histórico de la Ciudad de Morelia and, Testamentos del Colegio de Sonora.
u
CAP-2030 Nepal: Open Street Map tracker mapping dataset
rdr.ucl.ac.uk
bin
Updated Feb 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naomi Saville (2023). CAP-2030 Nepal: Open Street Map tracker mapping dataset [Dataset]. http://doi.org/10.5522/04/22109690.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5522/04/22109690.v2
Dataset updated
Feb 27, 2023
Dataset provided by
University College London
Authors
Naomi Saville
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Nepal
Description
The Stata data file "jumla_kavre_osmtracker_merged.dta” and equivalent excel file of the same name comprises data on water, waste management and landmarks collected by adolescent secondary school students during a "Citizen Science" project in the district of Kavre in the central hills of Nepal during April 2022 and in the district of Jumla in the remote mountains of West Nepal during June 2022. The project was part of a CIFF-funded Children in All Policies 2030 (CAP2030) project.

The data were generated by the students using an open access data collection and mapping application called Open Street Map (OSM) tracker, which had been adapted with Nepali language prompts by Researchers from Kathmandu Living Labs (KLL). Researchers from KLL and University College London (UCL) trained the adolescents to record tracks and way points of certain types of information including categories of waste management (rubbish dumps/bins), water sources and public amenities. The resulting datafile is a summary of the data collected showing the latitude/longitude, name, and category of the type of location and the district. The app and the process of gathering the data are described in a paper entitled "Citizen science for climate change resilience: engaging adolescents to study climate hazards, biodiversity and nutrition in rural Nepal" submitted to Wellcome Open Research in Feb 2023. The data contributed to Table 5, and Figure 4 of this paper.
H
Data from: Clear as Black and White: The Effects of Ambiguous Rhetoric...
dataverse.harvard.edu
application/x-stata +1
Updated Jun 12, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2017). Clear as Black and White: The Effects of Ambiguous Rhetoric Depend on Candidate Race [Dataset]. http://doi.org/10.7910/DVN/JSDUTQ
Explore at:
application/x-stata-syntax(4597), application/x-stata-syntax(3287), application/x-stata(80379)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/JSDUTQ
Dataset updated
Jun 12, 2017
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
To replicate our analyses, open the Stata dataset. Then run the "constructing variables" do-file, which makes the variables we use in the analyses. Then run the "replication analyses" do-file, which also notes which analyses go with which table or figure.
H
Replication Data for: Zorro versus Covid-19: fighting the pandemic with face...
dataverse.harvard.edu
Updated Feb 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damette Olivier (2021). Replication Data for: Zorro versus Covid-19: fighting the pandemic with face masks [Dataset]. http://doi.org/10.7910/DVN/8BDD9T
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8BDD9T
Dataset updated
Feb 7, 2021
Dataset provided by
Harvard Dataverse
Authors
Damette Olivier
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
replication dataset for part 1 of the paper (tables 1 to 3) (impact of mask wearing on the number of infected cases and fatality rates): 'maskpanel2.dta' file replication dataset for part 2 (tables 4, 5 and following) (drivers of mask wearing): 'drivers.dta' file TO OPEN and USE WITH STATA SOFTWARE (STATA 13 HAS BEEN USED) A do file (code file) is associated with both .dta files to replicate the results.
d
Data from: Bullying and Violence on the School Bus: A Mixed-Methods...
catalog.data.gov
s.cnmilf.com
+3more
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Bullying and Violence on the School Bus: A Mixed-Methods Assessment of Behavioral Management Strategies, United States, 2016-2018 [Dataset]. https://catalog.data.gov/dataset/bullying-and-violence-on-the-school-bus-a-mixed-methods-assessment-of-behavioral-mana-2016-a2e15
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
United States
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme files for a brief dscription of the files available with this collection and consult the investigator(s) if further information is needed. The qualitative data are not available as part of the data collection at this time. Numerous high-profile events involving student victimization on school buses have raised critical questions regarding the safety of school-based transportation for children, the efforts taken by school districts to protect students on buses, and the most effective transportation-based behavioral management strategies for reducing misconduct. To address these questions, a national web-based survey was administered to public school district-level transportation officials throughout the United States to assess the prevalence of misconduct on buses, identify strategies to address misconduct, and describe effective ways to reduce student misbehavior on buses. Telephone interviews were also conducted with a small group of transportation officials to understand the challenges of transportation-based behavioral management, to determine successful strategies to create safe and positive school bus environments, and to identify data-driven approaches for tracking and assessing disciplinary referrals. The collection includes 10 Stata data files: BVSBS_analysis file.dta (n=2,595; 1058 variables) Title Crosswalk File.dta (n=2,594; 3 variables) Lessons Learned and Open Dummies.dta (n=1,543; 200 variables) CCD dataset.dta (n=12,494; 89 variables) BVSB_REGION.dta (n=4; 3 variables) BVSB_SCHOOLS.dta (n=3; 3 variables) BVSB_STUDENTS.dta (n=3; 3 variables) BVSB_URBAN.dta (n=8; 3 variables) BVSB_WHITE.dta (n=3; 3 variables) FINALRAKER.dta (n=2,595; 2 variables)
w
Handbook on Impact Evaluation: Quantitative Methods and Practices -...
microdata.worldbank.org
catalog.ihsn.org
Updated Nov 20, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. Khandker, G. Koolwal and H. Samad (2013). Handbook on Impact Evaluation: Quantitative Methods and Practices - Exercises 2009 - Bangladesh [Dataset]. https://microdata.worldbank.org/index.php/catalog/436
Explore at:
Dataset updated
Nov 20, 2013
Dataset authored and provided by
S. Khandker, G. Koolwal and H. Samad
Time period covered
2009
Area covered
Bangladesh
Description
Abstract

This exercise dataset was created for researchers interested in learning how to use the models described in the "Handbook on Impact Evaluation: Quantitative Methods and Practices" by S. Khandker, G. Koolwal and H. Samad, World Bank, October 2009 (permanent URL http://go.worldbank.org/FE8098BI60).

Public programs are designed to reach certain goals and beneficiaries. Methods to understand whether such programs actually work, as well as the level and nature of impacts on intended beneficiaries, are main themes of this book. Has the Grameen Bank, for example, succeeded in lowering consumption poverty among the rural poor in Bangladesh? Can conditional cash transfer programs in Mexico and Latin America improve health and schooling outcomes for poor women and children? Does a new road actually raise welfare in a remote area in Tanzania, or is it a "highway to nowhere?"

This handbook reviews quantitative methods and models of impact evaluation. It begings by reviewing the basic issues pertaining to an evaluation of an intervention to reach certain targets and goals. It then focuses on the experimental design of an impact evaluation, highlighting its strengths and shortcomings, followed by discussions on various non-experimental methods. The authors also cover methods to shed light on the nature and mechanisms by which different participants are benefiting from the program.

The handbook provides STATA exercises in the context of evaluating major microcredit programs in Bangladesh, such as the Grameen Bank. This dataset provides both the related Stata data files and the Stata programs.
r
The role of non-performing loans for bank lending rates (replication data)
resodate.org
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Bredl (2025). The role of non-performing loans for bank lending rates (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC90aGUtcm9sZS1vZi1ub24tcGVyZm9ybWluZy1sb2Fucy1mb3ItYmFuay1sZW5kaW5nLXJhdGVzLXJlcGxpY2F0aW9uLWRhdGE=
Explore at:
Dataset updated
Oct 6, 2025
Dataset provided by
Journal of Economics and Statistics
ZBW
ZBW Journal Data Archive
Authors
Sebastian Bredl
Description
The analysis considers the role of non-performing loans (NPLs) for bank lending rates on newly granted loans. It is based on euro area data. The focus is on an effect caused by the stock of NPLs that extends beyond losses that banks have already incorporated into their reported capital positions. The paper assesses the channels through which such an effect occurs most importantly whether it runs through banks' idiosyncratic funding costs.

File 0 contains a description of the data used for the analysis. It does not contain actual data as most data used for the analysis is confidential. The file contains the names of the Stata-dta-Files in which the datasets are stored. These Stata-dta-Files are the starting point for the data processing which is activated by the code in the subsequent Stata-do-Files.

Files 1-3 contain the code for processing SNL and Bankscope / Orbis data. This data includes the banking group level data for the analysis (most importantly NPL / regulatory capital data). File 1 contains the code for the processing of SNL data. File 2 contains the code for the of the Bankscope / Orbis data. File 3 contains the code for merging SNL and Bankscope / Orbis data.

Files 4-6 contain the code for processing the CSDB data which includes data on the cost of bond funding on the banking group level, iBSI / iMIR data which includes data on lending rates and lending volumes on the single bank level and the macroeconomic data. File 4 contains the code for the processing of the CSDB data. Note that this data is initially on the single security level and is processed such, that information on costs of bond funding on the banking group level is retrieved. File 5 contains the code for the processing of the iBSI / iMIR data. File 6 contains the code for the processing of the macroeconomic variables.

File 7 contains the code for merging all datasets. File 8 contains the code for producing the descriptive statistics in Section 3 of the paper. File 9 contains the code for the estimation of Equations 1 and 3 of the paper. File 10 contains the code for the estimation of Equations 1 and 3 with random samples (Appendix D of the paper). File 11 contains the code for estimations with loan growth as dependent variable (Section 5.2 of the paper).

Files 12 and 13 contain code for the data processing and estimation of Equation 2 on the banking group level.
f
CAP-2030 Nepal: Dataset on sociodemographic characteristics, phone and...
datasetcatalog.nlm.nih.gov
rdr.ucl.ac.uk
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saville, Naomi (2023). CAP-2030 Nepal: Dataset on sociodemographic characteristics, phone and internet access and climate change awareness [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001039581
Explore at:
Dataset updated
Feb 21, 2023
Authors
Saville, Naomi
Area covered
Nepal
Description
The Stata data file "CAP_Demographics_Jumla_Kavre_recoded.dta” and equivalent excel file of the same name comprises data collected by adolescent secondary school students during a "Citizen Science" project in the district of Kavre in the central hills of Nepal during April 2022 and in the district of Jumla in the remote mountains of West Nepal during June 2022. The project was part of a CIFF-funded Children in All Policies 2030 (CAP2030) project. The data were generated by the students using a mobile device data collection form developed using "Open Data Kit (ODK) Collect" electronic data collection platform by Kathmandu Living Labs (KLL) and University College London (UCL) for the purposes of this study. Researchers from KLL and UCL trained the adolescents to record basic socio-demographic information about themselves and their households including caste/ethnicity, religion, education, water sources, assets, household characteristics, and income sources. The form also asked about their access to mobile phones or other devices and internet and their concerns with respect to climate change. The resulting data describe the participants in the citizen science project, but their names and addresses have been removed. The app and the process of gathering the data are described in a paper entitled "Citizen science for climate change resilience: engaging adolescents to study climate hazards, biodiversity and nutrition in rural Nepal" submitted to Wellcome Open Research in Feb 2023. The data contributed to Tables 2 and 3 of this paper.
Labor Force Survey, LFS 2017 - Palestine
erfdataportal.com
Updated Mar 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2021). Labor Force Survey, LFS 2017 - Palestine [Dataset]. https://www.erfdataportal.com/index.php/catalog/170
Explore at:
Dataset updated
Mar 22, 2021
Dataset provided by
Palestinian Central Bureau of Statisticshttps://pcbs.gov/
Economic Research Forum
Time period covered
2017
Area covered
Palestine
Description
Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2017 (LFS). The survey rounds covered a total sample of about 23,120 households (5,780 households per quarter).

The main objective of collecting data on the labour force and its components, including employment, unemployment and underemployment, is to provide basic information on the size and structure of the Palestinian labour force. Data collected at different points in time provide a basis for monitoring current trends and changes in the labour market and in the employment situation. These data, supported with information on other aspects of the economy, provide a basis for the evaluation and analysis of macro-economic policies.

The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

Geographic coverage

Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.

Kind of data

Sample survey data [ssd]

Sampling procedure

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.

---> Target Population: It consists of all individuals aged 10 years and Above and there are staying normally with their households in the state of Palestine during 2017.

---> Sampling Frame: The sampling frame consists of the master sample, which was updated in 2011: each enumeration area consists of buildings and housing units with an average of about 124 households. The master sample consists of 596 enumeration areas; we used 494 enumeration areas as a framework for the labor force survey sample in 2017 and these units were used as primary sampling units (PSUs).

---> Sampling Size: The estimated sample size is 5,780 households in each quarter of 2017.

---> Sample Design The sample is two stage stratified cluster sample with two stages : First stage: we select a systematic random sample of 494 enumeration areas for the whole round ,and we excluded the enumeration areas which its sizes less than 40 households. Second stage: we select a systematic random sample of 16 households from each enumeration area selected in the first stage, se we select a systematic random of 16 households of the enumeration areas which its size is 80 household and over and the enumeration areas which its size is less than 80 households we select systematic random of 8 households.

---> Sample strata: The population was divided by: 1- Governorate (16 governorate) 2- Type of Locality (urban, rural, refugee camps).

---> Sample Rotation: Each round of the Labor Force Survey covers all of the 494 master sample enumeration areas. Basically, the areas remain fixed over time, but households in 50% of the EAs were replaced in each round. The same households remain in the sample for two consecutive rounds, left for the next two rounds, then selected for the sample for another two consecutive rounds before being dropped from the sample. An overlap of 50% is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes).

Mode of data collection

Face-to-face [f2f]

Research instrument

The survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:

---> 1. Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.

---> 2. Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.

---> 3. Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.

---> 4. Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.

Cleaning operations

---> Raw Data PCBS started collecting data since 1st quarter 2017 using the hand held devices in Palestine excluding Jerusalem in side boarders (J1) and Gaza Strip, the program used in HHD called Sql Server and Microsoft. Net which was developed by General Directorate of Information Systems. Using HHD reduced the data processing stages, the fieldworkers collect data and sending data directly to server then the project manager can withdrawal the data at any time he needs. In order to work in parallel with Gaza Strip and Jerusalem in side boarders (J1), an office program was developed using the same techniques by using the same database for the HHD.

---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

The survey sample consists of about 30,230 households of which 23,120 households completed the interview; whereas 14,682 households from the West Bank and 8,438 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in the West Bank reached 82.4% while in the Gaza Strip it reached 92.7%.

Sampling error estimates

---> Sampling Errors Data of this survey may be affected by sampling errors due to use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators: the variance table is attached with the final report. There is no problem in disseminating results at national or governorate level for the West Bank and Gaza Strip.

---> Non-Sampling Errors Non-statistical errors are probable in all stages of the project, during data collection or processing. This is referred to as non-response errors, response errors, interviewing errors, and data entry errors. To avoid errors and reduce their effects, great efforts were made to train the fieldworkers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, carrying out a pilot survey, as well as practical and theoretical training during the training course. Also data entry staff were trained on the data entry program that was examined before starting the data entry process. To stay in contact with progress of fieldwork activities and to limit obstacles, there was continuous contact with the fieldwork team through regular visits to the field and regular meetings with them during the different field visits. Problems faced by fieldworkers were discussed to clarify any issues. Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically.

They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The implementation of the survey encountered non-response where the case ( household was not present at home ) during the fieldwork visit and the case ( housing unit is vacant) become the high percentage of the non response cases. The total non-response rate reached14.2% which is very low once compared to the household surveys conducted by PCBS , The refusal rate reached 3.0% which is very low percentage compared to the
Effects of community management on user activity in online communities
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Cottica; Alberto Cottica (2025). Effects of community management on user activity in online communities [Dataset]. http://doi.org/10.5281/zenodo.1320261
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1320261
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Cottica; Alberto Cottica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.

Instructions:

Unzip the files.

Start with JSON files obtained from calling platform APIs: each dataset consists of one file for posts, one for comments, one for users. In the paper we use two datasets, one referring Edgeryders, the other to Matera 2019.

Run them through edgesense (https://github.com/edgeryders/edgesense). Edgesense allows to set the length of the observation period. We set it to 1 week and 1 day for Edgeryders data, and to 1 day for Matera 2019 data. Edgesense stores its results in a file called JSON network.min.json, which we then rename to keep track of the data source and observation length.

Launch Jupyter Notebook and run the notebook provided to convert the network.min.json files into CSV flat files, one for each netwrk file

Launch Stata and open each flat csv files with it, then save it in Stata format.

Use the provided Stata .do scripts to replicate results.

Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
g
Uniform Crime Reporting Program Data: Offenses Known and Clearances by...
datasearch.gesis.org
Updated Jun 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaplan, Jacob (2018). Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest, 1960-2016 [Dataset]. http://doi.org/10.3886/E100707V3-5862
Explore at:
Unique identifier
https://doi.org/10.3886/E100707V3-5862
Dataset updated
Jun 12, 2018
Dataset provided by
da|ra (Registration agency for social science and economic data)
Authors
Kaplan, Jacob
Description
This version (V3) fixes a bug in Version 2 where 1993 data did not properly deal with missing values, leading to enormous counts of crime being reported. This is a collection of Offenses Known and Clearances By Arrest data from 1960 to 2016. The monthly zip files contain one data file per year(57 total, 1960-2016) as well as a codebook for each year. These files have been read into R using the ASCII and setup files from ICPSR (or from the FBI for 2016 data) using the package asciiSetupReader. The end of the zip folder's name says what data type (R, SPSS, SAS, Microsoft Excel CSV, feather, Stata) the data is in. Due to file size limits on open ICPSR, not all file types were included for all the data. The files are lightly cleaned. What this means specifically is that column names and value labels are standardized. In the original data column names were different between years (e.g. the December burglaries cleared column is "DEC_TOT_CLR_BRGLRY_TOT" in 1975 and "DEC_TOT_CLR_BURG_TOTAL" in 1977). The data here have standardized columns so you can compare between years and combine years together. The same thing is done for values inside of columns. For example, the state column gave state names in some years, abbreviations in others. For the code uses to clean and read the data, please see my GitHub file here. https://github.com/jacobkap/crime_data/blob/master/R_code/offenses_known.RThe zip files labeled "yearly" contain yearly data rather than monthly. These also contain far fewer descriptive columns about the agencies in an attempt to decrease file size. Each zip folder contains two files: a data file in whatever format you choose and a codebook. The data file is aggregated yearly and has already combined every year 1960-2016. For the code I used to do this, see here https://github.com/jacobkap/crime_data/blob/master/R_code/yearly_offenses_known.R.If you find any mistakes in the data or have any suggestions, please email me at jkkaplan6@gmail.comAs a description of what UCR Offenses Known and Clearances By Arrest data contains, the following is copied from ICPSR's 2015 page for the data.The Uniform Crime Reporting Program Data: Offenses Known and Clearances By Arrest dataset is a compilation of offenses reported to law enforcement agencies in the United States. Due to the vast number of categories of crime committed in the United States, the FBI has limited the type of crimes included in this compilation to those crimes which people are most likely to report to police and those crimes which occur frequently enough to be analyzed across time. Crimes included are criminal homicide, forcible rape, robbery, aggravated assault, burglary, larceny-theft, and motor vehicle theft. Much information about these crimes is provided in this dataset. The number of times an offense has been reported, the number of reported offenses that have been cleared by arrests, and the number of cleared offenses which involved offenders under the age of 18 are the major items of information collected.
u
CAP-2030 Nepal: Dataset on climate-change related hazards
rdr.ucl.ac.uk
bin
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naomi Saville (2023). CAP-2030 Nepal: Dataset on climate-change related hazards [Dataset]. http://doi.org/10.5522/04/22109603.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5522/04/22109603.v1
Dataset updated
Feb 21, 2023
Dataset provided by
University College London
Authors
Naomi Saville
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Nepal
Description
The Stata data file "CAP_Hazard_Kavre_Jumla_varnames.dta" and equivalent excel file of the same name comprises data collected by adolescent secondary school students during a "Citizen Science" project in the district of Kavre in the central hills of Nepal during April 2022 and in the district of Jumla in the remote mountains of West Nepal during June 2022. The project was part of a CIFF-funded Children in All Policies 2030 (CAP2030) project.

The data were generated by the students using a mobile device data collection form developed using "Open Data Kit (ODK) Collect" electronic data collection platform by Kathmandu Living Labs (KLL) and University College London (UCL) for the purposes of this study. Researchers from KLL and UCL trained the adolescents to record information, geolocation and/or photos about climate-change associated hazards including landslides, floods, extreme weather events and crop pests/failure. The resulting datafile includes the latitude/longitude, name, and category of the type of hazard, date the hazard event was recorded, date it occurred and the district. Links to photographs of the hazards are included but require login to the KLL server. Users of the data may contact KLL (contact@kathmandulivinglabs.org) or UCL (n.saville@ucl.ac.uk) if access to photographs is required. The data were generated as part of a learning exercise for students to raise awareness of the impacts of climate change in their locale. Since the students were using 10 android tablets to record information in a reasonably limited geographical area, the dataset may contain several copies of the same event recorded by different individuals, so cannot be used for calculation of prevalence of hazard events. Rather, the data serve to demonstrate the potential of citizen science methods with Nepali school students to record such information. The app and the process of gathering the data are described in a paper entitled "Citizen science for climate change resilience: engaging adolescents to study climate hazards, biodiversity and nutrition in rural Nepal" submitted to Wellcome Open Research in Feb 2023. The data contributed to Table 4 of this paper.
f
Dataset for Hepmarc: a 96 week randomised controlled feasibility trial of...
datasetcatalog.nlm.nih.gov
sussex.figshare.com
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orkin, Chloe; Housman, Rosalie; Perry, Nicky; Bradshaw, Daniel; Jennings, Louise; Nelson, Mark; Bremner, Stephen; Kirk, Sarah; Gilleece, Yvonne; Miras, Helena; Robinson, Rachel; Abramowicz, Iga; Clarke, Emily; Fox, Ashini; Curnock, Michael; Gompels, Mark; Verma, Sumita; Lambert, Pauline; Chadwick, David (2023). Dataset for Hepmarc: a 96 week randomised controlled feasibility trial of add-on maraviroc in people with HIV and non-alcoholic fatty liver disease [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000974668
Explore at:
Dataset updated
Aug 1, 2023
Authors
Orkin, Chloe; Housman, Rosalie; Perry, Nicky; Bradshaw, Daniel; Jennings, Louise; Nelson, Mark; Bremner, Stephen; Kirk, Sarah; Gilleece, Yvonne; Miras, Helena; Robinson, Rachel; Abramowicz, Iga; Clarke, Emily; Fox, Ashini; Curnock, Michael; Gompels, Mark; Verma, Sumita; Lambert, Pauline; Chadwick, David
Description
Data for paper published in PLOS ONE 14.07.2023 These files were used for the statistical analysis of the hemparc feasibility trial using Stata software verson 17, and are as follows, both Stata format and .csv format as appropriate. The .do file is a simple text file. hepmarc_data minimum dataset: .csv, .dta: See doi:10.1136/bmjopen-2019-035596 for study protocol describing all data collected hepmarc Data dictionary .xls, .dta; description of each data fields in minimum dataset hepmarc AE listing: Adverse events listing .csv, .dta hepmarc SAP v1.0 240322_.xls .dta; description of each data fields in minimum dataset hepmarc data.do Stata .do file used to perform the analysis Notes: Each particpant's age has been altered by a random amount to preserve anonymity. There are two rows for two of the participants who each reported two adverse reactions. Abstract Objectives Maraviroc may reduce hepatic inflammation in people with HIV and non-alcoholic fatty liver disease (HIV-NAFLD) through CCR5-receptor antagonism, which warrants further exploration. Methods We performed an open-label 96-week randomised-controlled feasibility trial of maraviroc plus optimised background therapy (OBT) versus OBT alone, in a 1:1 ratio, for people with virologically-suppressed HIV-1 and NAFLD without cirrhosis. Dosing followed recommendations for HIV therapy in the Summary of Product Characteristics for maraviroc. The primary outcomes were safety, recruitment and retention rates, adherence and data completeness. Secondary outcomes included the change in Fibroscan-assessed liver stiffness measurements (LSM), controlled attenuation parameter (CAP) and Enhanced Liver Fibrosis (ELF) scores. Results Fifty-three participants (53/60, 88% of target) were recruited; 23 received maraviroc plus OBT; 89% were male; 19% had type 2 diabetes mellitus. The median baseline LSM, CAP & ELF scores were 6.2 (IQR 4.6-7.8) kPa, 325 (IQR 279-351) dB/m and 9.1 (IQR 8.6-9.6) respectively. Primary outcomes: all individuals eligible after screening were randomised; there was 92% (SD 6.6%) adherence to maraviroc [target >90%]; 83% (95%CI 70%-92%) participant retention [target >65%]; 5.5% of data were missing [target <20%]. There were noo Serious Adverse Reactions ; mild-moderate intensity Adverse Reactions were reported by five participants (5/23, 22% (95%CI 5%-49%)) [target <10%]. All Adverse Reactionss resolved. Secondary outcomes: no important differences were seen by treatment group for the change from baseline in LSM, CAP or ELF scores Conclusions This feasibility study provides preliminary evidence of maraviroc safety amongst people with HIV-NAFLD, and acceptable recruitment, retention, and adherence rates. These data support a definitive randomised-controlled trial assessing maraviroc impact on hepatic steatosis and fibrosis. Clinical trial registry: ISCRTN, registration number 31461655.
u
Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset...
datacatalogue.ukdataservice.ac.uk
Updated Jul 29, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Manchester, Cathie Marsh Centre for Census and Survey Research, ESDS Government (2011). Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset [Dataset]. http://doi.org/10.5255/UKDA-SN-6792-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-6792-1
Dataset updated
Jul 29, 2011
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
University of Manchester, Cathie Marsh Centre for Census and Survey Research, ESDS Government
Area covered
England
Description
The Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset was prepared as a resource for those interested in learning introductory small area estimation techniques. It was first presented as part of a workshop entitled 'Introducing small area estimation techniques and applying them to the Health Survey for England using Stata'. The data are accompanied by a guide that includes a practical case study enabling users to derive estimates of disability for districts in the absence of survey estimates. This is achieved using various models that combine information from ESDS government surveys with other aggregate data that are reliably available for sub-national areas. Analysis is undertaken using Stata statistical software; all relevant syntax is provided in the accompanying '.do' files.

The data files included in this teaching resource contain HSE variables and data from the Census and Mid-year population estimates and projections that were developed originally by the National Statistical agencies, as follows:
The main data file, 'hse_data.dta', is a reduced version of the HSE for 2000 and 2001. In order to combine data from two years of the HSE in a consistent way some changes have been made to the weights in each year. Additionally, some recoding of the limiting long term illness (LLTI), disability and the age variable has also been undertaken.
File 'practical_1_task_5_data.dta' contains population counts and model mobility disability rates (estimated during practical 1) distinguishing single year of age and sex for the six case study districts.
File 'practical_2_data.dta' contains the aggregate data required for Practical 2, including age- and sex-specific rates of LLTI (Census) for six UK case study districts, age- and sex-specific rates of mobility disability for England (HSE), and population counts for the six districts.
File 'pop_data_practical_3.dta' contains population counts for the six districts (by age, sex and LLTI status) required for practical 3
The original HSEs for 2000 and 2001 are held at the UK Data Archive under SNs 4628 and 4912 respectively. Full details of the recoding of HSE variables and how the aggregate data was produced can be found in the data documentation.

This unrestricted access data collection is freely available to download under an Open Government Licence from the UK Data Service. Note that the files should be unzipped/saved to the C: drive of the computer to be used; all syntax assumes files are saved at this location.
Labor Force Survey, LFS 2006 - Egypt
erfdataportal.com
Updated Feb 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Agency For Public Mobilization And Statistics (2023). Labor Force Survey, LFS 2006 - Egypt [Dataset]. https://www.erfdataportal.com/index.php/catalog/146
Explore at:
Dataset updated
Feb 5, 2023
Dataset provided by
Central Agency for Public Mobilization and Statisticshttps://www.capmas.gov.eg/
Economic Research Forum
Time period covered
2006
Area covered
Egypt
Description
Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)

In any society, the human element represents the basis of the work force which exercises all the service and production activities. Therefore, it is a mandate to produce labor force statistics and studies, that is related to the growth and distribution of manpower and labor force distribution by different types and characteristics.

In this context, the Central Agency for Public Mobilization and Statistics conducts "Quarterly Labor Force Survey" which includes data on the size of manpower and labor force (employed and unemployed) and their geographical distribution by their characteristics.

By the end of each year, CAPMAS issues the annual aggregated labor force bulletin publication that includes the results of the quarterly survey rounds that represent the manpower and labor force characteristics during the year.

----> Historical Review of the Labor Force Survey:

1- The First Labor Force survey was undertaken in 1957. The first round was conducted in November of that year, the survey continued to be conducted in successive rounds (quarterly, bi-annually, or annually) till now.

2- Starting the October 2006 round, the fieldwork of the labor force survey was developed to focus on the following two points: a. The importance of using the panel sample that is part of the survey sample, to monitor the dynamic changes of the labor market. b. Improving the used questionnaire to include more questions, that help in better defining of relationship to labor force of each household member (employed, unemployed, out of labor force ...etc.). In addition to re-order of some of the already existing questions in much logical way.

3- Starting the January 2008 round, the used methodology was developed to collect more representative sample during the survey year. this is done through distributing the sample of each governorate into five groups, the questionnaires are collected from each of them separately every 15 days for 3 months (in the middle and the end of the month)

----> The survey aims at covering the following topics:

1- Measuring the size of the Egyptian labor force among civilians (for all governorates of the republic) by their different characteristics. 2- Measuring the employment rate at national level and different geographical areas. 3- Measuring the distribution of employed people by the following characteristics: gender, age, educational status, occupation, economic activity, and sector. 4- Measuring unemployment rate at different geographic areas. 5- Measuring the distribution of unemployed people by the following characteristics: gender, age, educational status, unemployment type "ever employed/never employed", occupation, economic activity, and sector for people who have ever worked.

The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.

Geographic coverage

Covering a sample of urban and rural areas in all the governorates.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)

----> Sample Design and Selection

The sample of the LFS 2006 survey is a simple systematic random sample.

----> Sample Size

The sample size varied in each quarter (it is Q1=19429, Q2=19419, Q3=19119 and Q4=18835) households with a total number of 76802 households annually. These households are distributed on the governorate level (urban/rural).

A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document available among external resources in Arabic.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire design follows the latest International Labor Organization (ILO) concepts and definitions of labor force, employment, and unemployment.

The questionnaire comprises 3 tables in addition to the identification and geographic data of household on the cover page.

----> Table 1- Demographic and employment characteristics and basic data for all household individuals

Including: gender, age, educational status, marital status, residence mobility and current work status

----> Table 2- Employment characteristics table

This table is filled by employed individuals at the time of the survey or those who were engaged to work during the reference week, and provided information on: - Relationship to employer: employer, self-employed, waged worker, and unpaid family worker - Economic activity - Sector - Occupation - Effective working hours - Work place - Average monthly wage

----> Table 3- Unemployment characteristics table

This table is filled by all unemployed individuals who satisfied the unemployment criteria, and provided information on: - Type of unemployment (unemployed, unemployed ever worked) - Economic activity and occupation in the last held job before being unemployed - Last unemployment duration in months - Main reason for unemployment

Cleaning operations

----> Raw Data

Office editing is one of the main stages of the survey. It started once the questionnaires were received from the field and accomplished by the selected work groups. It includes: a-Editing of coverage and completeness b-Editing of consistency

----> Harmonized Data

The STATA is used to clean and SPSS is used harmonize the datasets.

The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency.

All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables.

A post-harmonization cleaning process is then conducted on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

Facebook

Twitter

Click to copy link

Link copied

Cite

Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD

Current Population Survey (CPS)

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/AK4FDD

Dataset updated

Nov 21, 2023

Dataset provided by

Harvard Dataverse

Authors

Damico, Anthony

Description

analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

Clear search

Close search

Google apps

Main menu

Current Population Survey (CPS)

The Canada Trademarks Dataset

Dataset on communal popular initiatives in Germany for the year 2018

Galvanising the Open Access Community: A Study on the Impact of Plan S -...

Survey of Consumer Finances (SCF)

Mexican Wealth Distribution 1810-1910

CAP-2030 Nepal: Open Street Map tracker mapping dataset

Data from: Clear as Black and White: The Effects of Ambiguous Rhetoric...

Replication Data for: Zorro versus Covid-19: fighting the pandemic with face...

Data from: Bullying and Violence on the School Bus: A Mixed-Methods...

Handbook on Impact Evaluation: Quantitative Methods and Practices -...

Abstract

The role of non-performing loans for bank lending rates (replication data)

CAP-2030 Nepal: Dataset on sociodemographic characteristics, phone and...

Labor Force Survey, LFS 2017 - Palestine

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Effects of community management on user activity in online communities

Uniform Crime Reporting Program Data: Offenses Known and Clearances by...

CAP-2030 Nepal: Dataset on climate-change related hazards

Dataset for Hepmarc: a 96 week randomised controlled feasibility trial of...

Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset...

Labor Force Survey, LFS 2006 - Egypt

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Current Population Survey (CPS)