Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data for meta-analysis of replications project.
The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.
Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demograohic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor chracteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty
National
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
The 2008 Household Expenditure and Income Survey sample was designed using two-stage cluster stratified sampling method. In the first stage, the primary sampling units (PSUs), the blocks, were drawn using probability proportionate to the size, through considering the number of households in each block to be the block size. The second stage included drawing the household sample (8 households from each PSU) using the systematic sampling method. Fourth substitute households from each PSU were drawn, using the systematic sampling method, to be used on the first visit to the block in case that any of the main sample households was not visited for any reason.
To estimate the sample size, the coefficient of variation and design effect in each subdistrict were calculated for the expenditure variable from data of the 2006 Household Expenditure and Income Survey. This results was used to estimate the sample size at sub-district level, provided that the coefficient of variation of the expenditure variable at the sub-district level did not exceed 10%, with a minimum number of clusters that should not be less than 6 at the district level, that is to ensure good clusters representation in the administrative areas to enable drawing poverty pockets.
It is worth mentioning that the expected non-response in addition to areas where poor families are concentrated in the major cities were taken into consideration in designing the sample. Therefore, a larger sample size was taken from these areas compared to other ones, in order to help in reaching the poverty pockets and covering them.
Face-to-face [f2f]
List of survey questionnaires: (1) General Form (2) Expenditure on food commodities Form (3) Expenditure on non-food commodities Form
Raw Data The design and implementation of this survey procedures were: 1. Sample design and selection 2. Design of forms/questionnaires, guidelines to assist in filling out the questionnaires, and preparing instruction manuals 3. Design the tables template to be used for the dissemination of the survey results 4. Preparation of the fieldwork phase including printing forms/questionnaires, instruction manuals, data collection instructions, data checking instructions and codebooks 5. Selection and training of survey staff to collect data and run required data checkings 6. Preparation and implementation of the pretest phase for the survey designed to test and develop forms/questionnaires, instructions and software programs required for data processing and production of survey results 7. Data collection 8. Data checking and coding 9. Data entry 10. Data cleaning using data validation programs 11. Data accuracy and consistency checks 12. Data tabulation and preliminary results 13. Preparation of the final report and dissemination of final results
Harmonized Data - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets - The harmonization process started with cleaning all raw data files received from the Statistical Office - Cleaned data files were then all merged to produce one data file on the individual level containing all variables subject to harmonization - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables - A post-harmonization cleaning process was run on the data - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.
This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.
The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.
The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.
This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.
The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.
Explore the GAPs Data Repository at https://data.returnmigration.eu/.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The tasks (called items in the study) are the first 6 histogram and all 6 case-value plot tasks (hence, the first 12 tasks from the data in dataset 1_Raw_Data_Students). It contains all data needed for reproducing the results described in the qualitative article belonging to this dataset, including for example, codebook, coding of transcripts, RStudio file for calculating accuracy and precision. Also detailed coding results, including second coder results. Note that the raw data of this project as well as the design of the project, materials and so on are in the dataset: 1_Raw_Data_Students. The latter dataset is needed for replicating the whole eye-tracking study.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Open Trade Statistics initiative was developed to ease access to international trade data by providing downloadable SQL database dumps, a public API, a dashboard, and an R package for data retrieval. This project was born out of the recognition that many academic institutions in Latin America lack access to academic subscriptions and comprehensive datasets like the United Nations Commodity Trade Statistics Database. The OTS project not only offers a solution to this problem regarding international trade data but also emphasizes the importance of reproducibility in data processing. Through the use of open-source tools, the project ensures that its datasets are accessible and easy to use for research and analysis.
OTS, based on the official correlation tables, provides a harmonized dataset where the values are converted to HS revision 2012 for the years 1980-2021 and it involved transforming some of the reported data to find equivalent codes between the different classifications. For instance, the HS revision 1992 code '271011' (aviation spirit) does not have a direct equivalent in HS revision 2012 and it can be converted to the more general code '271000' (oils petroleum, bituminous, distillates, except crude). The same process was applied to the SITC codes.
Country codes are also standardized in OTS. For instance, missing ISO-3 country codes in the raw data were replaced by the values expressed in UN COMTRADE documentation. For instance, the numeric code '490' corresponds to 'e-490' but it appears as a blank value in the raw data, and UN COMTRADE documentation
indicates that 'e-490' corresponds to 'Other Asia, Not Elsewhere Specified (NES)'.
Commercial purposes are strictly out of the boundaries of what you can do with this data according to UN Comtrade dissemination clauses.
Visit tradestatistics.io to access the dashboard and R package for data retrieval.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset provides review time metrics for the San Francisco Planning Department’s application review process. The following metrics are provided: total days to Planning approval, days to finish completeness review, days to first check plan letter, and days to complete resubmission review. Targets for each metric and outcomes relative to these targets are also included. These metrics allow for ongoing tracking for individual planning projects and for the calculation of summary statistics for Planning review timelines. There are both Project level metrics and project event level metrics in this table.
You can see a dashboard which shows the City's current permit processing performance on sf.gov.
B. HOW THE DATASET IS CREATED Planning application review is tracked within Planning’s Project and Permit Tracking System (PPTS). Planners enter review period start and end dates in PPTS when review milestones are reached. Review timeline data is extracted from PPTS and review timelines and outcomes are calculated and consolidated within this dataset. The dataset is generated by a data model that pulls from multiple raw Accela sources and joins them together.
C. UPDATE PROCESS This dataset is updated daily overnight.
D. HOW TO USE THIS DATASET Use this dataset to analyze project level timelines for planning projects or to calculate summary metrics related to the planning review and approval processes. The review metric type is defined in the ‘project stage’ column. Note that multiple rounds of completeness check review and resubmission review may occur for a single Planning project. The ‘potential error’ column flags records where data entry errors are likely present. Filter out rows where a value is entered in this column before building summary statistics.
E. RELATED DATASETS
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 25% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
Surveys related to the family budget are considered one of the most important surveys types carried out by the Department Of Statistics, since it provides data on household expenditure and income and their relationship with different indicators. Therefore, most of the countries undertake periodic surveys on household income and expenditures. The Department Of Statistics, since established, conducted a series of Expenditure and Income Surveys during the years 1966, 1980, 1986/1987, 1992, 1997, 2002/2003, 2006/2007, 2008/2009, 2010/2011 and because of continuous changes in spending patterns, income levels and prices, as well as in the population internal and external migration, it was necessary to update data for household income and expenditure over time. Hence, the need to implement the Household Expenditure and Income Survey for the year 2013 arises.
The survey was then conducted to achieve the following objectives: 1. Provide data on income and expenditure to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. 2. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index. 3. Provide the necessary data for the national accounts related to overall consumption and income of the household sector. 4. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty. 5. Identify consumer spending patterns prevailing in the society, and the impact of demographic, social and economic variables on those patterns. 6. Calculate the average annual income of the household and the individual, and identify the relationship between income and different socio-economic factors, such as profession and educational level of the head of the household and other indicators. 7. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing household surveys in several Arab countries.
The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the Kingdom. Where the Kingdom is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 25% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The Household Expenditure and Income survey sample, for the year 2013, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 10 households was selected from each cluster, in addition to another 5 households selected as a backup for the basic sample, using a systematic sampling technique. Those 5 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2010 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (8 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map. It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.
Face-to-face [f2f]
To reach the survey objectives, 3 forms have been developed. Those forms were finalized after being tested and reviewed by specialists taking into account making the data entry, and validation, process on the computer as simple as possible.
(1) General Form/Questionnaire This form includes: - Housing characteristics such as geographic location variables, household area, building material predominant for external walls, type of tenure, monthly rent or lease, main source of water, lighting, heating and fuel cooking, sanitation type and water cycle, the number of rooms in the dwelling, in addition to providing ownership status of some home appliances and car. - Characteristics of household members: This form focused on the social characteristics of the family members such as relation to the head of the family, gender, age and educational status and marital status. It also included economic characteristics such as economic activity, and the main occupation, employment status, and the labor sector. To the additions of questions about individual continued to stay with the family, in order to update the information at the end of each of the four rounds of the survey. - Income section which included three parts · Family ownership of assets · Productive activities for the family · Current income sources
(2) Expenditure on food commodities form/Questionnaire This form indicates expenditure data on 17 consumption groups. Each group includes a number of food commodities, with the exception of the latter group, which was confined to some of the non-food goods and services because of their frequent spending pattern on daily basis like food commodities. For the purposes of the efficient use of results, expenditure data of the latter group was moved with the non-food commodities expenditure. The form also includes estimated amounts of own-produced food items and those received as gifts or in an in-kind form, as well as servants living with the family spending on themselves from their own wages to buy food.
(3) Expenditure on non-food commodities form/Questionnaire This form indicates expenditure data on 11 groups of non-food items, and 5 sets of spending on services, in addition to a group of consumption expenditure. It also includes an estimate of self-consumption, and non-food gifts or other items in an in-kind form received or sent by the household, as well as servants living with the family spending on themselves from their own wages to buy non-food items.
----> Raw Data
The data collection phase was then followed by the data processing stage accomplished through the following procedures: 1- Organizing forms/questionnaires A compatible archive system, with the nature of the subsequent operations, was used to classify the forms according to different round throughout the year. This is to effectively enable extracting the forms when required for processing. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms are back to the archive system. 2- Data office checking This phase is achieved concurrently with the data collection phase in the field, where questionnaires completed in the fieldwork are immediately sent to data office checking phase. 3- Data coding A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were use, while for the rest of the questions, all coding were predefined
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Normative learning theories dictate that we should preferentially attend to informative sources, but only up to the point that our limited learning systems can process their content. Humans, including infants, show this predicted strategic deployment of attention. Here we demonstrate that rhesus monkeys, much like humans, attend to events of moderate surprisingness over both more and less surprising events. They do this in the absence of any specific goal or contingent reward, indicating that the behavioral pattern is spontaneous. We suggest this U-shaped attentional preference represents an evolutionarily preserved strategy for guiding intelligent organisms toward material that is maximally useful for learning. Methods How the data were collected: In this project, we collected gaze data of 5 macaques when they watched sequential visual displays designed to elicit probabilistic expectations using the Eyelink Toolbox and were sampled at 1000 Hz by an infrared eye-monitoring camera system. Dataset:
"csv-combined.csv" is an aggregated dataset that includes one pop-up event per row for all original datasets for each trial. Here are descriptions of each column in the dataset:
subj: subject_ID = {"B":104, "C":102,"H":101,"J":103,"K":203} trialtime: start time of current trial in second trial: current trial number (each trial featured one of 80 possible visual-event sequences)(in order) seq current: sequence number (one of 80 sequences) seq_item: current item number in a seq (in order) active_item: pop-up item (active box) pre_active: prior pop-up item (actve box) {-1: "the first active object in the sequence/ no active object before the currently active object in the sequence"} next_active: next pop-up item (active box) {-1: "the last active object in the sequence/ no active object after the currently active object in the sequence"} firstappear: {0: "not first", 1: "first appear in the seq"} looks_blank: csv: total amount of time look at blank space for current event (ms); csv_timestamp: {1: "look blank at timestamp", 0: "not look blank at timestamp"} looks_offscreen: csv: total amount of time look offscreen for current event (ms); csv_timestamp: {1: "look offscreen at timestamp", 0: "not look offscreen at timestamp"} time till target: time spent to first start looking at the target object (ms) {-1: "never look at the target"} looks target: csv: time spent to look at the target object (ms);csv_timestamp: look at the target or not at current timestamp (1 or 0) look1,2,3: time spent look at each object (ms) location 123X, 123Y: location of each box (location of the three boxes for a given sequence were chosen randomly, but remained static throughout the sequence) item123id: pop-up item ID (remained static throughout a sequence) event time: total time spent for the whole event (pop-up and go back) (ms) eyeposX,Y: eye position at current timestamp
"csv-surprisal-prob.csv" is an output file from Monkilock_Data_Processing.ipynb. Surprisal values for each event were calculated and added to the "csv-combined.csv". Here are descriptions of each additional column:
rt: time till target {-1: "never look at the target"}. In data analysis, we included data that have rt > 0. already_there: {NA: "never look at the target object"}. In data analysis, we included events that are not the first event in a sequence, are not repeats of the previous event, and already_there is not NA. looks_away: {TRUE: "the subject was looking away from the currently active object at this time point", FALSE: "the subject was not looking away from the currently active object at this time point"} prob: the probability of the occurrence of object surprisal: unigram surprisal value bisurprisal: transitional surprisal value std_surprisal: standardized unigram surprisal value std_bisurprisal: standardized transitional surprisal value binned_surprisal_means: the means of unigram surprisal values binned to three groups of evenly spaced intervals according to surprisal values. binned_bisurprisal_means: the means of transitional surprisal values binned to three groups of evenly spaced intervals according to surprisal values.
"csv-surprisal-prob_updated.csv" is a ready-for-analysis dataset generated by Analysis_Code_final.Rmd after standardizing controlled variables, changing data types for categorical variables for analysts, etc. "AllSeq.csv" includes event information of all 80 sequences
Empty Values in Datasets:
There is no missing value in the original dataset "csv-combined.csv". Missing values (marked as NA in datasets) happen in columns "prev_active", "next_active", "already_there", "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" in "csv-surprisal-prob.csv" and "csv-surprisal-prob_updated.csv". NAs in columns "prev_active" and "next_active" mean that the first or the last active object in the sequence/no active object before or after the currently active object in the sequence. When we analyzed the variable "already_there", we eliminated data that their "prev_active" variable is NA. NAs in column "already there" mean that the subject never looks at the target object in the current event. When we analyzed the variable "already there", we eliminated data that their "already_there" variable is NA. Missing values happen in columns "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" when it is the first event in the sequence and the transitional probability of the event cannot be computed because there's no event happening before in this sequence. When we fitted models for transitional statistics, we eliminated data that their "bisurprisal", "std_bisurprisal", and "sq_std_bisurprisal" are NAs.
Codes:
In "Monkilock_Data_Processing.ipynb", we processed raw fixation data of 5 macaques and explored the relationship between their fixation patterns and the "surprisal" of events in each sequence. We computed the following variables which are necessary for further analysis, modeling, and visualizations in this notebook (see above for details): active_item, pre_active, next_active, firstappear ,looks_blank, looks_offscreen, time till target, looks target, look1,2,3, prob, surprisal, bisurprisal, std_surprisal, std_bisurprisal, binned_surprisal_means, binned_bisurprisal_means. "Analysis_Code_final.Rmd" is the main scripts that we further processed the data, built models, and created visualizations for data. We evaluated the statistical significance of variables using mixed effect linear and logistic regressions with random intercepts. The raw regression models include standardized linear and quadratic surprisal terms as predictors. The controlled regression models include covariate factors, such as whether an object is a repeat, the distance between the current and previous pop up object, trial number. A generalized additive model (GAM) was used to visualize the relationship between the surprisal estimate from the computational model and the behavioral data. "helper-lib.R" includes helper functions used in Analysis_Code_final.Rmd
The University of Wisconsin Probabilistic Downscaling (UWPD) is a statistically downscaled dataset based on the Coupled Model Intercomparison Project Phase 5 (CMIP5) climate models. UWPD consists of three variables, daily precipitation and maximum and minimum temperature. The spatial resolution is 0.1°x0.1° degree resolution for the United States and southern Canada east of the Rocky Mountains.
The downscaling methodology is not deterministic. Instead, to properly capture unexplained variability and extreme events, the methodology predicts a spatially and temporally varying Probability Density Function (PDF) for each variable. Statistics such as the mean, mean PDF and annual maximum statistics can be calculated directly from the daily PDF and these statistics are included in the dataset. In addition, “standard”, “raw” data is created by randomly sampling from the PDFs to create a “realization” of the local scale given the large-scale from the climate model. There are 3 realizations for temperature and 14 realizations for precipitation.
The directory structure of the data is as follows
[cmip_version]/[scenario]/[climate_model]/[ensemble_member]/
The realizations are as follows
prcp_[realization_number][year].nc
temp
[realization_number][year].nc
The time mean files averaged over certain year bounds are as follows
prcp_mean
[year_bound_1][year_bound_2].nc
temp_mean
[year_bound_1][year_bound_2].nc
The time-mean Cumulative Distribution Function (CDF) files are as follows
prcp_cdf
[year_bound_1][year_bound_2].nc
temp_cdf
[year_bound_1][year_bound_2].nc
The CDF of the annual maximum precipitation is given for each year in the record
prcp_annual_max_cdf
[start_year_of_scenario]_[end_year_of_scenario].nc
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Replication pack, FSE2018 submission #164: ------------------------------------------
**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem **Note:** link to data artifacts is already included in the paper. Link to the code will be included in the Camera Ready version as well. Content description =================== - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files described below - **settings.py** - settings template for the code archive. - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset. This dataset only includes stats aggregated by the ecosystem (PyPI) - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages themselves, which take around 2TB. - **build_model.r, helpers.r** - R files to process the survival data (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, `common.cache/survival_data.pypi_2008_2017-12_6.csv` in **dataset_full_Jan_2018.tgz**) - **Interview protocol.pdf** - approximate protocol used for semistructured interviews. - LICENSE - text of GPL v3, under which this dataset is published - INSTALL.md - replication guide (~2 pages)
Replication guide ================= Step 0 - prerequisites ---------------------- - Unix-compatible OS (Linux or OS X) - Python interpreter (2.7 was used; Python 3 compatibility is highly likely) - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible) Depending on detalization level (see Step 2 for more details): - up to 2Tb of disk space (see Step 2 detalization levels) - at least 16Gb of RAM (64 preferable) - few hours to few month of processing time Step 1 - software ---------------- - unpack **ghd-0.1.0.zip**, or clone from gitlab: git clone https://gitlab.com/user2589/ghd.git git checkout 0.1.0 `cd` into the extracted folder. All commands below assume it as a current directory. - copy `settings.py` into the extracted folder. Edit the file: * set `DATASET_PATH` to some newly created folder path * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` - install docker. For Ubuntu Linux, the command is `sudo apt-get install docker-compose` - install libarchive and headers: `sudo apt-get install libarchive-dev` - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools` Without this dependency, you might get an error on the next step, but it's safe to ignore. - install Python libraries: `pip install --user -r requirements.txt` . - disable all APIs except GitHub (Bitbucket and Gitlab support were not yet implemented when this study was in progress): edit `scraper/init.py`, comment out everything except GitHub support in `PROVIDERS`. Step 2 - obtaining the dataset ----------------------------- The ultimate goal of this step is to get output of the Python function `common.utils.survival_data()` and save it into a CSV file: # copy and paste into a Python console from common import utils survival_data = utils.survival_data('pypi', '2008', smoothing=6) survival_data.to_csv('survival_data.csv') Since full replication will take several months, here are some ways to speedup the process: ####Option 2.a, difficulty level: easiest Just use the precomputed data. Step 1 is not necessary under this scenario. - extract **dataset_minimal_Jan_2018.zip** - get `survival_data.csv`, go to the next step ####Option 2.b, difficulty level: easy Use precomputed longitudinal feature values to build the final table. The whole process will take 15..30 minutes. - create a folder `
We note that we include only do files, and a log file of our work and not any raw data. This is because, as we note in the online appendix, we use individual level data from Swedish registers. The data material is located on an encrypted server to which we have to log in through a remote desktop application in order to perform all of our data analyses. Due to the sensitivity of the data, we are under contractual and ethical obligation not to distribute these data to others. For those researchers who want to replicate our results there are two ways to get access to the ad- ministrative data. The first way is to order the data directly from Statistics Sweden (SCB). Statis- tics Sweden presently requires that researchers obtain permission from the Swedish Ethical Review Board before data can be ordered (a description of how to order data from Statistics Sweden is available at: https://www.scb.se/en/services/guidance-for-researchers-and-universities/). We will also make available a complete list all of the variables that we ordered from Statistics Sweden for this project, together with the statistical code used for the analyses. The second way to replicate our analyses is to come to Sweden and reanalyze these data through the same remote server system that we used. Researchers interested in using this option should reach out to us prior to coming to Sweden so that we can apply for approval from the Ethical Review Board for the researcher to temporarily be added to our research team, which is mandatory in order to get access to the remote server system.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
QUORA_ONE_MANY_QA
This dataset is derived from quora.com questioning data. It is a question with multiple answers. The project provide gas for mnbvc.
STATISTICS
Raw data size
100w 16G 200w 17G 300w 15G 400w 11G 500w 10G 600w 9G 700w 9G 800w 7.5G 900w 7G 1000w 6.5G Updating...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The School of Education at the University of Cape Town (UCT) investigated children’s learning through digital play. The aim of the study was to explore the intersection between child play, technology, creativity and learning among children aged between 3 and 11 years. The study also identified skills and dispositions children develop through both digital and non-digital play. The data shared emerged from a survey of parents of children in the stated age group, with particular reference to the parents views on children's play practices, including time parents spent playing with their children, concerns parents had on time children spend playing on various technologies, types of play children in South Africa engaged in and the concerns of parents when children played with some electronic devices. The following data files are shared:SA - Survey - Children, Technology and Play (CTAP) - Google Forms.pdfDescriptive Stats 2020.1.9 -Children Technology and Play SURVEY.xlsxParent Survey RAW PUBLIC DATA 2020.2.29 - Children Technology and Play Project.xlsxParent Survey RAW PUBLIC DATA 2020.2.29 - Children Technology and Play Project.csvParent Survey REPORT DATA 2020.2.29 - Children Technology and Play Project.xlsxParent Survey REPORT DATA 2020.2.29 - Children Technology and Play Project.csvParent Survey RAW and REPORT DATA SYNTAX 2020.2.29 - Children Technology and Play Project.spsNOTE: This survey was adapted from Marsh, J. Stjerne Thomsen, B., Parry, B., Scott, F. Bishop, J.C., Bannister, C., Driscoll, A., Margary, T., Woodgate, A., (2019) Children, Technology and Play. UK Survey Questions. LEGO Foundation.
The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.
The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.
The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.
1- Household/families. 2- Individuals.
The survey covered all Palestinian households who are usually resident in the Palestinian Territory during 2010.
Sample survey data [ssd]
The sampling frame consists of all enumeration areas which were enumerated in 2007, each numeration area consists of buildings and housing units with average of about 120 households in it. These enumeration areas are used as primary sampling units PSUs in the first stage of the sampling selection.
The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 192 enumeration areas. Second stage: selection of a systematic random sample of 24 households from each enumeration area selected in the first stage.
Note: in Jerusalem Governorate (J1), 13 enumeration areas were selected; then in the second phase, a group of households from each enumeration area were chosen using census-2007 method of delineation and enumeration. This method was adopted to ensure household response is to the maximum to comply with the percentage of non-response as set in the sample design.Enumeration areas were distributed to twelve months and the sample for each quarter covers sample strata (Governorate, locality type) Sample strata:
1- Governorate 2- Type of Locality (urban, rural, refugee camps)
The calculated sample size for the Expenditure and Consumption Survey in 2010 is about 3,757 households, 2,574 households in West Bank and 1,183 households in Gaza Strip.
Face-to-face [f2f]
The questionnaire consists of two main parts:
First: Survey's questionnaire
Part of the questionnaire is to be filled in during the visit at the beginning of the month, while the other part is to be filled in at the end of the month. The questionnaire includes:
Control sheet: Includes household’s identification data, date of visit, data on the fieldwork and data processing team, and summary of household’s members by gender.
Household roster: Includes demographic, social, and economic characteristics of household’s members.
Housing characteristics: Includes data like type of housing unit, number of rooms, value of rent, and connection of housing unit to basic services like water, electricity and sewage. In addition, data in this section includes source of energy used for cooking and heating, distance of housing unit from transportation, education, and health centers, and sources of income generation like ownership of farm land or animals.
Food and Non-Food Items: includes food and non-food items, and household record her expenditure for one month.
Durable Goods Schedule: Includes list of main goods like washing machine, refrigerator,TV.
Assistances and Poverty: Includes data about cash and in kind assistances (assistance value,assistance source), also collecting data about household situation, and the procedures to cover expenses.
Monthly and annual income: Data pertinent to household’s income from different sources is collected at the end of the registration period.
Second: List of goods
The classification of the list of goods is based on the recommendation of the United Nations for the SNA under the name Classification of Personal Consumption by purpose. The list includes 55 groups of expenditure and consumption where each is given a sequence number based on its importance to the household starting with food goods, clothing groups, housing, medical treatment, transportation and communication, and lastly durable goods. Each group consists of important goods. The total number of goods in all groups amounted to 667 items for goods and services. Groups from 1-21 includes goods pertinent to food, drinks and cigarettes. Group 22 includes goods that are home produced and consumed by the household. The groups 23-45 include all items except food, drinks and cigarettes. The groups 50-55 include durable goods. The data is collected based on different reference periods to represent expenditure during the whole year except for cars where data is collected for the last three years.
Registration form
The registration form includes instructions and examples on how to record consumption and expenditure items. The form includes columns: 1.Monetary: If the good is purchased, or in kind: if the item is self produced. 2.Title of the service of the good 3.Unit of measurement (kilogram, liter, number) 4. Quantity 5. Value
The pages of the registration form are colored differently for the weeks of the month. The footer for each page includes remarks that encourage households to participate in the survey. The following are instructions that illustrate the nature of the items that should be recorded: 1. Monetary expenditures during purchases 2. Purchases based on debts 3.Monetary gifts once presented 4. Interest at pay 5. Self produced food and goods once consumed 6. Food and merchandise from commercial project once consumed 7. Merchandises once received as a wage or part of a wage from the employer.
Data editing took place through a number of stages, including: 1. Office editing and coding 2. Data entry 3. Structure checking and completeness 4. Structural checking of SPSS data files
The survey sample consisted of 4,767 households, which includes 4,608 households of the original sample plus 159 households as an additional sample. A total of 3,757 households completed the interview: 2,574 households from the West Bank and 1,183 households in the Gaza Strip. Weights were modified to account for the non-response rate. The response rate in the Palestinian Territory 28.1% (82.4% in the West Bank was and 81.6% in Gaza Strip).
The impact of errors on data quality was reduced to a minimum due to the high efficiency and outstanding selection, training, and performance of the fieldworkers. Procedures adopted during the fieldwork of the survey were considered a necessity to ensure the collection of accurate data, notably: 1) Develop schedules to conduct field visits to households during survey fieldwork. The objectives of the visits and the data collected on each visit were predetermined. 2) Fieldwork editing rules were applied during the data collection to ensure corrections were implemented before the end of fieldwork activities. 3) Fieldworkers were instructed to provide details in cases of extreme expenditure or consumption by the household. 4) Questions on income were postponed until the final visit at the end of the month. 5) Validation rules were embedded in the data processing systems, along with procedures to verify data entry and data edit.
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The Palestinian Central Bureau of Statistics (PCBS) carried out the Socio-Economic Survey 2018. The survey round covered a total sample of about 9926 households.
The main objective of collecting data on the socio-economic and its components, including demographic charachteristics, employment, unemployment, is to provide basic information on the size and structure of the Palestinian households, as well as other data on the status of housing and characteristics of individuals, the family and living conditions.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.
1- Household/family. 2- Individual/person.
The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.
---> Target Population: It consists of all Palestinian households and individuals who habitually reside with their households in Palestine for the reference period of the survey, which is 2018, and there was a focus on individuals of the age group 18 years and over to complete an additional form for individuals in this category.
---> Sampling Frame: The sampling frame consists of a comprehensive sample selected from the Population, Housing and Establishments Census 2007: This comprehensive sample consists of geographical areas with an average of 124 households, and these are considered as enumeration areas used in the census and these units were used as primary sampling units (PSUs).
---> Sampling Size: The estimated sample size is 9926 households in 2018.
---> Sample Design The sample is two stage stratified cluster sample with three stages : First stage: a systematic random sample of 337 enumeration areas for the whole round was selected. Second stage: the same households that were visited in the previous survey session 2015, which are about 25 households from each enumeration area were visited in 2018. Households that changed their place of residence and address available in the previous database to another place to complete the questionnaire are tracked. Individuals from the previous session who were separated from their household and formed new households or joined new households. Third stage: a male or female individual was selected from all sample families (old and new) in the second stage of individuals 18 years and over using check tables to complete the form for individuals 18 years and over (quality of life model). Taking into account that the household whose number is even in the enumeration area sample, we choose a female from it, and the household whose number is odd, from which we choose a male.
---> Sample strata: The population was divided by: 1. The governorate (16 governorates in the West Bank, including Jerusalem J1 governorate, which the Israeli occupation annexed to it by force after its occupation of the West Bank in 1967 as Tabqa and the Gaza Strip). 2. Type of settlement (urban, rural, camp). 3. Area C (class C, non-C) as an implicit class.
Face-to-face [f2f]
The questionnaire is the key tool for data collection. It must be conforming to the technical characteristics of fieldwork to allow for data processing and analysis. The survey questionnaire comprised the following parts: - Part one: Identification data. - Part two: Quality control - Part three: Data of households’ members and social data. - Part four: Housing unit data - Part five: Assistance and Coping Strategies Information - Part six: Expenditure and Consumption - Part seven: Food Variation and Facing Food Shortage - Part eight: Income - Part nine: Agricultural and economic activities. - Part ten: Freedom of mobility - In addition to a questionnaire for individuals (18 years old and above): Questions on suffering and life quality, assessment of health, education, administration (Ministry of the Interior) services and information technology.
---> Raw Data PCBS started collecting data since 27/8/2018 using the hand held devices in Palestine excluding Jerusalem in side boarders (J1) and Gaza Strip, the program used in HHD called Sql Server and Microsoft. Net which was developed by General Directorate of Information Systems. Using HHD reduced the data processing stages, the fieldworkers collect data and sending data directly to server then the project manager can withdrawal the data at any time he needs. In order to work in parallel with Gaza Strip and Jerusalem in side boarders (J1), an office program was developed using the same techniques by using the same database for the HHD.
---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
The survey sample consists of about 11,008 households of which 9,926 households completed the interview; whereas 5,898 households from the West Bank and 4,028 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in Palestine reached 90.2%.
---> Sampling Errors Those errors result from studying part (sample) of the society and not all society units. Since the socio-economic conditions survey 2018 was conducted on a sample, sampling errors are expected to occur. To minimize sampling errors, a properly designed probability sample was used to calculate errors throughout the process. This means that for every unit of the society there is a probability to be selected in the sample. The variance was calculated to measure the impact on sample design for Palestine.
---> Non-Sampling Errors Non-Sampling errors are possible at all stages of the project, during data collection or processing. Those are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the fieldworkers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, as well as practical and theoretical training during the training course. Non-sampling errors in the survey resulted from the private data it collected and that some households considered this as interference in the very details of their private life. They refused to cooperate with data. Several methods were used to convince households provide answers and minimize non-response.
Concept of data quality covers many aspects, starting from the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are seven dimensions of the statistical quality: relevance, accuracy, timeliness, accessibility, comparability, coherence, and completeness.
MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory
--------------------------------------------------------------------------------------
MSZSI is a data extraction tool for Google Earth Engine that aggregates time-series remote sensing information to multiple administrative levels using the FAO GAUL data layers. The code at the bottom of this page (metadata) can be pasted into the Google Earth Engine JavaScript code editor and ran at https://code.earthengine.google.com/.
Please refer to the associated publication:
Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624.
https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624
Input options:
[1] Country of interest
[2] Start and end year
[3] Start and end month
[4] Option to mask data to a specific land-use/land-cover type
[5] Land-use/land-cover type code from CGLS LULC
[6] Image collection for data aggregation
[7] Desired band from the image collection
[8] Statistics type for the zonal aggregations
[9] Statistic to use for annual aggregation
[10] Scaling options
[11] Export folder and label suffix
Output: Two CSVs containing zonal statistics for each of the FAO GAUL administrative level boundaries
Output fields: system:index, 0-ADM0_CODE, 0-ADM0_NAME, 0-ADM1_CODE, 0-ADM1_NAME, 0-ADMN_CODE, 0-ADMN_NAME, 1-AREA_PERCENT_LULC, 1-AREA_SQM_LULC, 1-AREA_SQM_ZONE, 2-X_2001, 2-X_2002, 2-X_2003, ..., 2-X_2020, .geo
PREPROCESSED DATA DOWNLOAD
The datasets available for download contain zonal statistics at 2 administrative levels (FAO GAUL levels 1 and 2). Select countries from Southeast Asia and Sub-Saharan Africa (Cambodia, Indonesia, Lao PDR, Myanmar, Philippines, Thailand, Vietnam, Burundi, Kenya, Malawi, Mozambique, Rwanda, Tanzania, Uganda, Zambia, Zimbabwe) are included in the current version, with plans to extend the dataset to contain global metrics. Each zip file is described below and two example NDVI tables are available for preview.
Key: [source, data, units, temporal range, aggregation, masking, zonal statistic, notes]
Currently available:
MSZSI-V2_V-NDVI-MEAN.tar: [NASA-MODIS, NDVI, index, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-DAY-MEAN.tar: [NASA-MODIS, LST Day, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-NIGHT-MEAN.tar: [NASA-MODIS, LST Night, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_R-PRECIP-SUM.tar: [UCSB-CHG-CHIRPS, Precipitation, mm, 2001–2020, annual sum, agriculture, mean, n/a]
MSZSI-V2_S-BDENS-MEAN.tar: [OpenLandMap, Bulk density, g/cm3, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-ORGC-MEAN.tar: [OpenLandMap, Organic carbon, g/kg, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-PH-MEAN.tar: [OpenLandMap, pH in H2O, pH, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-WATER-MEAN.tar: [OpenLandMap, Soil water, % at 33kPa, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SAND-MEAN.tar: [OpenLandMap, Sand, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SILT-MEAN.tar: [OpenLandMap, Silt, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-CLAY-MEAN.tar: [OpenLandMap, Clay, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_E-ELEV-MEAN.tar: [MERIT, [elevation, slope, flowacc, HAND], [m, degrees, km2, m], static, n/a, agriculture, mean, n/a]
Coming soon
MSZSI-V2_C-STAX-MEAN.tar: [OpenLandMap, Soil taxonomy, category, static, n/a, agriculture, area sum, n/a]
MSZSI-V2_C-LULC-MEAN.tar: [CGLS-LC100-V3, LULC, category, 2015–2019, mode, none, area sum, n/a]
Data sources:
/*/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// MSZSI: Multi-Scale Zonal Statistics Inventory Authors: Brad G. Peter, Department of Geography, University of Alabama Joseph Messina, Department of Geography, University of Alabama Austin Raney, Department of Geography, University of Alabama Rodrigo E. Principe, AgriCircle AG Peilei Fan, Department of Geography, Environment, and Spatial Sciences, Michigan State University Citation: Peter, Brad; Messina, Joseph; Raney, Austin; Principe, Rodrigo; Fan, Peilei, 2021, 'MSZSI: Multi-Scale Zonal Statistics Inventory', https://doi.org/10.7910/DVN/YCUBXS, Harvard Dataverse, V# SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2021 (LFS). The survey rounds covered a total sample of about 25,179 households (about 6,300 households per quarter).
The main objective of collecting data on the labour force and its components, including employment, unemployment and underemployment, is to provide basic information on the size and structure of the Palestinian labour force. Data collected at different points in time provide a basis for monitoring current trends and changes in the labour market and in the employment situation. These data, supported with information on other aspects of the economy, provide a basis for the evaluation and analysis of macro-economic policies.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.
1- Household/family. 2- Individual/person.
The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.
---> Target Population: It consists of all individuals aged 10 years and Above and there are staying normally with their households in the state of Palestine during 2020.
---> Sampling Frame: The sampling frame consists of a comprehensive sample selected from the Population, Housing and Establishments Census 2017: This comprehensive sample consists of geographical areas with an average of 150 households, and these are considered as enumeration areas used in the census and these units were used as primary sampling units (PSUs).
---> Sampling Size: The estimated sample size is 8,040 households in each quarter of 2021.
---> Sample Design The sample is two stage stratified cluster sample with two stages : First stage: we select a systematic random sample of 536 enumeration areas for the whole round. Second stage: we select a systematic random sample of 15 households from each enumeration area selected in the first stage.
---> Sample strata: The population was divided by: 1- Governorate (17 governorates, where Jerusalem was considered as two statistical areas) 2- Type of Locality (urban, rural, refugee camps).
---> Sample Rotation: Each round of the Labor Force Survey covers all of the 536 master sample enumeration areas. Basically, the areas remain fixed over time, but households in 50% of the EAs were replaced in each round. The same households remain in the sample for two consecutive rounds, left for the next two rounds, then selected for the sample for another two consecutive rounds before being dropped from the sample. An overlap of 50% is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes).
Face-to-face [f2f]
The survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:
---> 1. Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.
---> 2. Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.
---> 3. Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.
---> 4. Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.
---> Raw Data PCBS started collecting data since 1st quarter 2020 using the hand held devices in Palestine excluding Jerusalem in side boarders (J1) and Gaza Strip, the program used in HHD called Sql Server and Microsoft. Net which was developed by General Directorate of Information Systems. From the beginning of March 2020, with the spread of the COVID-19 pandemic and the home quarantine imposed by the government, the personal (face to face) interview was replaced by the phone interview for households who had phone numbers from previous rounds, and for those households that did not have phone numbers, they were referred to and interviewed in person (face to face interview). Using HHD reduced the data processing stages, the fieldworkers collect data and sending data directly to server then the project manager can withdrawal the data at any time he needs. In order to work in parallel with Gaza Strip and Jerusalem in side boarders (J1), an office program was developed using the same techniques by using the same database for the HHD.
---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
The survey sample consists of about 32,160 households of which 25,179 households completed the interview; whereas 16,355 households from the West Bank and 8,824 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in the West Bank reached 79.8% while in the Gaza Strip it reached 90.5%.
---> Sampling Errors Data of this survey may be affected by sampling errors due to use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators: the variance table is attached with the final report. There is no problem in disseminating results at national or governorate level for the West Bank and Gaza Strip.
---> Non-Sampling Errors Non-statistical errors are probable in all stages of the project, during data collection or processing. This is referred to as non-response errors, response errors, interviewing errors, and data entry errors. To avoid errors and reduce their effects, great efforts were made to train the fieldworkers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, carrying out a pilot survey, as well as practical and theoretical training during the training course. Also data entry staff were trained on the data entry program that was examined before starting the data entry process. To stay in contact with progress of fieldwork activities and to limit obstacles, there was continuous contact with the fieldwork team through regular visits to the field and regular meetings with them during the different field visits. Problems faced by fieldworkers were discussed to clarify any issues. Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically.
They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The implementation of the survey encountered non-response where the case ( household was not present at home ) during the fieldwork visit and the case ( housing unit is vacant) become the high percentage of the non response cases. The total non-response rate reached 16.7% which is very low once compared to the
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set contains socio-economic indicator variables' time series and terrorist attacks numbers time series for selected 16 countries. The raw data was collected from databases World Bank Open Data, Polity IV Project, Comparative Political Data Set, ILO, and Global Terrorism Database of the University of Maryland. The data was pre-processed for the needs of the research that was carried out on them.
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
This interactive map of Bangladesh highlights the project locations of the Integrated Agricultural Productivity Project (IAPP) and PRAN. Bangladesh is divided into seven administrative divisions, which are broken down into 64 districts, and further divided into 485 upazilas. This map overlays sub-national poverty data, demographic indicators, and other information relevant to the program. IAPP will target the districts of Rangpur, Kurigram, Lalmonirhat, and Nilfamari in the north and the districts of Barisal, Patuakhali, Barguna and Jhalokathi in the south. The project is expected to increase the productivity of major crops like cereals and pulses, increase the productivity of fish and livestock, increase the availability of certified seed, increase the irrigated area, and the income of farmers in all 54 upazilas in these eight districts. The project areas were selected for their high rates of poverty, food insecurity, and their vulnerability to natural shocks such as tidal surge in the south, and flash flood and drought in the north. GAFSP is financing the expansion of food processing and manufacturing capacity of Natore Agro Limited from PRAN group. PRAN group is the largest food and nutrition company in Bangladesh, with more than 40,000 employees and over 200 different products. The enhancement of operations is creating new jobs (over 1,200 expected), in a region severely affected by unemployment and is increasing the opportunities for local producers as raw material suppliers for the company. Data Sources: PRAN Project LocationSource: GAFSP Documents. IAPP Project Areas
Source: Project Appraisal Document (PAD). Poverty Incidence (Proportion of population below the poverty line) (2010): Proportion of the population living on less than US$1.25 a day, measured at 2005 international prices, adjusted for purchasing power parity (PPP).Source: Bangladesh Bureau of Statistics. “HIES Survey 2010 Chapter 6.” Malnutrition (Proportion of underweight children under 5 years) (2011): Prevalence of severely underweight children is the percentage of children under age 5 whose weight-for-age is more than 3 standard deviations below the median for the international reference population ages 0-59 months.Source: Measure DHS. “Bangladesh Demographic and Health Survey 2011. Preliminary Report.” Total Population (2011): Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship, except for refugees not permanently settled in the country of asylum, who are generally considered part of the population of their country of origin.Source: Bangladesh Bureau of Statistics. “Population and Housing Census 2011. Preliminary Results.” Population Density (2011): Population divided by land area in square kilometers.Source: Bangladesh Bureau of Statistics. “Population and Housing Census 2011. Preliminary Results.” Irrigated Area (2009/10): Total irrigated area in hectares.Source: Bangladesh Bureau of Statistics. 2010 Yearbook of Agricultural Statistics of Bangladesh. Potato Production (2009-10 and 2010-11): Total production in tons by variety and total production in tons per hectare by variety.Source: Bangladesh Bureau of Statistics. “2012 Yearbook of Agricultural Statistics of Bangladesh.” Boro Rice (2009-10 and 2010-11): Total production in tons by variety and total production in tons per hectare by variety.Source: Bangladesh Bureau of Statistics. “2012 Yearbook of Agricultural Statistics of Bangladesh.” Bangladesh Soil Salinity (2009): Saline soils, salinity boundary, and coastlines.
Source: Soil Resource Development Institute SRMAF Project – Bangladesh Ministry of Agriculture. “Saline Soils in Bangladesh 2010.”The maps displayed on this website are for reference only. The boundaries, colors, denominations and any other information shown on these maps do not imply, on the part of GAFSP (and the World Bank Group), any judgment on the legal status of any territory, or any endorsement or acceptance of such boundaries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data for meta-analysis of replications project.