43 datasets found

Raw data for meta-analysis of replications project
figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sonia Lee (2023). Raw data for meta-analysis of replications project [Dataset]. http://doi.org/10.6084/m9.figshare.3081610.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3081610.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sonia Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data for meta-analysis of replications project.
i
Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...
catalog.ihsn.org
Updated Jan 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Statistics (2022). Household Expenditure and Income Survey 2008, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/catalog/7661
Explore at:
Dataset updated
Jan 12, 2022
Dataset authored and provided by
Department of Statistics
Time period covered
2008 - 2009
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demograohic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor chracteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Household/families

Individuals

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

The 2008 Household Expenditure and Income Survey sample was designed using two-stage cluster stratified sampling method. In the first stage, the primary sampling units (PSUs), the blocks, were drawn using probability proportionate to the size, through considering the number of households in each block to be the block size. The second stage included drawing the household sample (8 households from each PSU) using the systematic sampling method. Fourth substitute households from each PSU were drawn, using the systematic sampling method, to be used on the first visit to the block in case that any of the main sample households was not visited for any reason.

To estimate the sample size, the coefficient of variation and design effect in each subdistrict were calculated for the expenditure variable from data of the 2006 Household Expenditure and Income Survey. This results was used to estimate the sample size at sub-district level, provided that the coefficient of variation of the expenditure variable at the sub-district level did not exceed 10%, with a minimum number of clusters that should not be less than 6 at the district level, that is to ensure good clusters representation in the administrative areas to enable drawing poverty pockets.

It is worth mentioning that the expected non-response in addition to areas where poor families are concentrated in the major cities were taken into consideration in designing the sample. Therefore, a larger sample size was taken from these areas compared to other ones, in order to help in reaching the poverty pockets and covering them.

Mode of data collection

Face-to-face [f2f]

Research instrument

List of survey questionnaires: (1) General Form (2) Expenditure on food commodities Form (3) Expenditure on non-food commodities Form

Cleaning operations

Raw Data The design and implementation of this survey procedures were: 1. Sample design and selection 2. Design of forms/questionnaires, guidelines to assist in filling out the questionnaires, and preparing instruction manuals 3. Design the tables template to be used for the dissemination of the survey results 4. Preparation of the fieldwork phase including printing forms/questionnaires, instruction manuals, data collection instructions, data checking instructions and codebooks 5. Selection and training of survey staff to collect data and run required data checkings 6. Preparation and implementation of the pretest phase for the survey designed to test and develop forms/questionnaires, instructions and software programs required for data processing and production of survey results 7. Data collection 8. Data checking and coding 9. Data entry 10. Data cleaning using data validation programs 11. Data accuracy and consistency checks 12. Data tabulation and preliminary results 13. Preparation of the final report and dissemination of final results

Harmonized Data - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets - The harmonization process started with cleaning all raw data files received from the Statistical Office - Cleaned data files were then all merged to produce one data file on the individual level containing all variables subject to harmonization - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables - A post-harmonization cleaning process was run on the data - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format
z
GAPs Data Repository on Return: Guideline, Data Samples and Codebook
zenodo.org
data.niaid.nih.gov
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeynep Sahin Mencutek; Zeynep Sahin Mencutek; Fatma Yılmaz-Elmas; Fatma Yılmaz-Elmas (2025). GAPs Data Repository on Return: Guideline, Data Samples and Codebook [Dataset]. http://doi.org/10.5281/zenodo.14862490
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14862490
Dataset updated
Feb 13, 2025
Dataset provided by
RedCAP
Authors
Zeynep Sahin Mencutek; Zeynep Sahin Mencutek; Fatma Yılmaz-Elmas; Fatma Yılmaz-Elmas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.

This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.

The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.

The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.

This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.

The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.

Explore the GAPs Data Repository at https://data.returnmigration.eu/.
D
3_Processed_Data_Students_12-Tasks
dataverse.nl
docx, mp4, txt, xlsx
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lonneke Boels; Lonneke Boels (2023). 3_Processed_Data_Students_12-Tasks [Dataset]. http://doi.org/10.34894/7KNEOH
Explore at:
mp4(75817184), mp4(32240929), mp4(86195988), mp4(63517245), docx(9037), docx(8237), txt(3801), mp4(55548311), mp4(43760988), mp4(53405418), mp4(61577583), mp4(44155395), mp4(73935581), mp4(39170947), docx(8758), mp4(77944000), docx(8563), mp4(155566456), docx(9790), docx(5109), mp4(90177801), mp4(34576156), mp4(60192895), mp4(85829998), mp4(47571316), docx(10540), docx(9810), xlsx(28365), mp4(49332815), mp4(43566895), mp4(79123492), docx(9669), docx(9372), mp4(24922488), xlsx(65504), mp4(49573948), docx(15667), docx(15295), xlsx(27788), docx(19565), docx(12452), mp4(64630983), mp4(56951837), docx(9595), docx(8290), mp4(49076447), mp4(53463861), mp4(46738126), docx(26260), mp4(56646917), docx(10549), docx(21263), mp4(40139116), docx(10497), docx(10093), mp4(70625289), mp4(65924738), mp4(53727106), mp4(32440162), mp4(48805633), mp4(63979232), docx(9127), xlsx(18238), mp4(141469927), docx(10143), docx(9886), mp4(60024446), mp4(47102934), docx(8530), mp4(38611497), mp4(77124916), mp4(84226975), docx(9307), mp4(39813293), mp4(66927556), docx(12229), mp4(43222146), mp4(69262537), mp4(53372385), mp4(48343341), mp4(65437025), mp4(62931400)Available download formats
Unique identifier
https://doi.org/10.34894/7KNEOH
Dataset updated
Aug 9, 2023
Dataset provided by
DataverseNL
Authors
Lonneke Boels; Lonneke Boels
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The tasks (called items in the study) are the first 6 histogram and all 6 case-value plot tasks (hence, the first 12 tasks from the data in dataset 1_Raw_Data_Students). It contains all data needed for reproducing the results described in the qualitative article belonging to this dataset, including for example, codebook, coding of transcripts, RStudio file for calculating accuracy and precision. Also detailed coding results, including second coder results. Note that the raw data of this project as well as the design of the project, materials and so on are in the dataset: 1_Raw_Data_Students. The latter dataset is needed for replicating the whole eye-tracking study.
Open Trade Statistics Database
zenodo.org
bin
Updated Aug 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauricio Vargas Sepulveda; Mauricio Vargas Sepulveda (2024). Open Trade Statistics Database [Dataset]. http://doi.org/10.5281/zenodo.13370487
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13370487
Dataset updated
Aug 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauricio Vargas Sepulveda; Mauricio Vargas Sepulveda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The Open Trade Statistics initiative was developed to ease access to international trade data by providing downloadable SQL database dumps, a public API, a dashboard, and an R package for data retrieval. This project was born out of the recognition that many academic institutions in Latin America lack access to academic subscriptions and comprehensive datasets like the United Nations Commodity Trade Statistics Database. The OTS project not only offers a solution to this problem regarding international trade data but also emphasizes the importance of reproducibility in data processing. Through the use of open-source tools, the project ensures that its datasets are accessible and easy to use for research and analysis.

OTS, based on the official correlation tables, provides a harmonized dataset where the values are converted to HS revision 2012 for the years 1980-2021 and it involved transforming some of the reported data to find equivalent codes between the different classifications. For instance, the HS revision 1992 code '271011' (aviation spirit) does not have a direct equivalent in HS revision 2012 and it can be converted to the more general code '271000' (oils petroleum, bituminous, distillates, except crude). The same process was applied to the SITC codes.

Country codes are also standardized in OTS. For instance, missing ISO-3 country codes in the raw data were replaced by the values expressed in UN COMTRADE documentation. For instance, the numeric code '490' corresponds to 'e-490' but it appears as a blank value in the raw data, and UN COMTRADE documentation
indicates that 'e-490' corresponds to 'Other Asia, Not Elsewhere Specified (NES)'.

Commercial purposes are strictly out of the boundaries of what you can do with this data according to UN Comtrade dissemination clauses.

Visit tradestatistics.io to access the dashboard and R package for data retrieval.
D
Planning Department Project Application Review metrics
data.sfgov.org
catalog.data.gov
application/rdfxml +5
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Planning Department Project Application Review metrics [Dataset]. https://data.sfgov.org/City-Infrastructure/Planning-Department-Project-Application-Review-met/d4jk-jw33
Explore at:
csv, json, application/rdfxml, tsv, application/rssxml, xmlAvailable download formats
Dataset updated
Jul 9, 2025
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A. SUMMARY This dataset provides review time metrics for the San Francisco Planning Department’s application review process. The following metrics are provided: total days to Planning approval, days to finish completeness review, days to first check plan letter, and days to complete resubmission review. Targets for each metric and outcomes relative to these targets are also included. These metrics allow for ongoing tracking for individual planning projects and for the calculation of summary statistics for Planning review timelines. There are both Project level metrics and project event level metrics in this table.

You can see a dashboard which shows the City's current permit processing performance on sf.gov.

B. HOW THE DATASET IS CREATED Planning application review is tracked within Planning’s Project and Permit Tracking System (PPTS). Planners enter review period start and end dates in PPTS when review milestones are reached. Review timeline data is extracted from PPTS and review timelines and outcomes are calculated and consolidated within this dataset. The dataset is generated by a data model that pulls from multiple raw Accela sources and joins them together.

C. UPDATE PROCESS This dataset is updated daily overnight.

D. HOW TO USE THIS DATASET Use this dataset to analyze project level timelines for planning projects or to calculate summary metrics related to the planning review and approval processes. The review metric type is defined in the ‘project stage’ column. Note that multiple rounds of completeness check review and resubmission review may occur for a single Planning project. The ‘potential error’ column flags records where data entry errors are likely present. Filter out rows where a value is entered in this column before building summary statistics.

E. RELATED DATASETS
Planning Department Project Events (coming soon)
Planning Department Projects (coming soon) Building Permits Building Permit Application Issuance Metrics Building Permit Completeness Check Review Metrics Building Permit Application Review Metrics Planning Department Project Application Review Metrics

Household Expenditure and Income Survey, HEIS 2013 - Jordan

erfdataportal.com

Updated Oct 12, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Department of Statistics (2022). Household Expenditure and Income Survey, HEIS 2013 - Jordan [Dataset]. http://erfdataportal.com/index.php/catalog/128

Explore at:

Dataset updated

Oct 12, 2022

Dataset provided by

Department of Statistics
Economic Research Forum

Time period covered

2013 - 2014

Area covered

Jordan

Description

Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 25% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN

Surveys related to the family budget are considered one of the most important surveys types carried out by the Department Of Statistics, since it provides data on household expenditure and income and their relationship with different indicators. Therefore, most of the countries undertake periodic surveys on household income and expenditures. The Department Of Statistics, since established, conducted a series of Expenditure and Income Surveys during the years 1966, 1980, 1986/1987, 1992, 1997, 2002/2003, 2006/2007, 2008/2009, 2010/2011 and because of continuous changes in spending patterns, income levels and prices, as well as in the population internal and external migration, it was necessary to update data for household income and expenditure over time. Hence, the need to implement the Household Expenditure and Income Survey for the year 2013 arises.

The survey was then conducted to achieve the following objectives: 1. Provide data on income and expenditure to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. 2. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index. 3. Provide the necessary data for the national accounts related to overall consumption and income of the household sector. 4. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty. 5. Identify consumer spending patterns prevailing in the society, and the impact of demographic, social and economic variables on those patterns. 6. Calculate the average annual income of the household and the individual, and identify the relationship between income and different socio-economic factors, such as profession and educational level of the head of the household and other indicators. 7. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it.

The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing household surveys in several Arab countries.

Geographic coverage

The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the Kingdom. Where the Kingdom is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered a national sample of households and all individuals permanently residing in surveyed households.

Kind of data

Sample survey data [ssd]

Sampling procedure

The Household Expenditure and Income survey sample, for the year 2013, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 10 households was selected from each cluster, in addition to another 5 households selected as a backup for the basic sample, using a systematic sampling technique. Those 5 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2010 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (8 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map. It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

Mode of data collection

Face-to-face [f2f]

Research instrument

To reach the survey objectives, 3 forms have been developed. Those forms were finalized after being tested and reviewed by specialists taking into account making the data entry, and validation, process on the computer as simple as possible.

(1) General Form/Questionnaire This form includes: - Housing characteristics such as geographic location variables, household area, building material predominant for external walls, type of tenure, monthly rent or lease, main source of water, lighting, heating and fuel cooking, sanitation type and water cycle, the number of rooms in the dwelling, in addition to providing ownership status of some home appliances and car. - Characteristics of household members: This form focused on the social characteristics of the family members such as relation to the head of the family, gender, age and educational status and marital status. It also included economic characteristics such as economic activity, and the main occupation, employment status, and the labor sector. To the additions of questions about individual continued to stay with the family, in order to update the information at the end of each of the four rounds of the survey. - Income section which included three parts · Family ownership of assets · Productive activities for the family · Current income sources

(2) Expenditure on food commodities form/Questionnaire This form indicates expenditure data on 17 consumption groups. Each group includes a number of food commodities, with the exception of the latter group, which was confined to some of the non-food goods and services because of their frequent spending pattern on daily basis like food commodities. For the purposes of the efficient use of results, expenditure data of the latter group was moved with the non-food commodities expenditure. The form also includes estimated amounts of own-produced food items and those received as gifts or in an in-kind form, as well as servants living with the family spending on themselves from their own wages to buy food.

(3) Expenditure on non-food commodities form/Questionnaire This form indicates expenditure data on 11 groups of non-food items, and 5 sets of spending on services, in addition to a group of consumption expenditure. It also includes an estimate of self-consumption, and non-food gifts or other items in an in-kind form received or sent by the household, as well as servants living with the family spending on themselves from their own wages to buy non-food items.

Cleaning operations

----> Raw Data

The data collection phase was then followed by the data processing stage accomplished through the following procedures: 1- Organizing forms/questionnaires A compatible archive system, with the nature of the subsequent operations, was used to classify the forms according to different round throughout the year. This is to effectively enable extracting the forms when required for processing. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms are back to the archive system. 2- Data office checking This phase is achieved concurrently with the data collection phase in the field, where questionnaires completed in the fieldwork are immediately sent to data office checking phase. 3- Data coding A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were use, while for the rest of the questions, all coding were predefined

Data from: Macaques preferentially attend to intermediately surprising...

data.niaid.nih.gov
datadryad.org
+1more

zip

Updated Apr 26, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Shengyi Wu; Tommy Blanchard; Emily Meschke; Richard Aslin; Ben Hayden; Celeste Kidd (2022). Macaques preferentially attend to intermediately surprising information [Dataset]. http://doi.org/10.6078/D15Q7Q

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.6078/D15Q7Q

Dataset updated

Apr 26, 2022

Dataset provided by

University of Minnesota
Yale University
University of California, Berkeley
Klaviyo

Authors

Shengyi Wu; Tommy Blanchard; Emily Meschke; Richard Aslin; Ben Hayden; Celeste Kidd

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Normative learning theories dictate that we should preferentially attend to informative sources, but only up to the point that our limited learning systems can process their content. Humans, including infants, show this predicted strategic deployment of attention. Here we demonstrate that rhesus monkeys, much like humans, attend to events of moderate surprisingness over both more and less surprising events. They do this in the absence of any specific goal or contingent reward, indicating that the behavioral pattern is spontaneous. We suggest this U-shaped attentional preference represents an evolutionarily preserved strategy for guiding intelligent organisms toward material that is maximally useful for learning. Methods How the data were collected: In this project, we collected gaze data of 5 macaques when they watched sequential visual displays designed to elicit probabilistic expectations using the Eyelink Toolbox and were sampled at 1000 Hz by an infrared eye-monitoring camera system. Dataset:

"csv-combined.csv" is an aggregated dataset that includes one pop-up event per row for all original datasets for each trial. Here are descriptions of each column in the dataset:

subj: subject_ID = {"B":104, "C":102,"H":101,"J":103,"K":203} trialtime: start time of current trial in second trial: current trial number (each trial featured one of 80 possible visual-event sequences)(in order) seq current: sequence number (one of 80 sequences) seq_item: current item number in a seq (in order) active_item: pop-up item (active box) pre_active: prior pop-up item (actve box) {-1: "the first active object in the sequence/ no active object before the currently active object in the sequence"} next_active: next pop-up item (active box) {-1: "the last active object in the sequence/ no active object after the currently active object in the sequence"} firstappear: {0: "not first", 1: "first appear in the seq"} looks_blank: csv: total amount of time look at blank space for current event (ms); csv_timestamp: {1: "look blank at timestamp", 0: "not look blank at timestamp"} looks_offscreen: csv: total amount of time look offscreen for current event (ms); csv_timestamp: {1: "look offscreen at timestamp", 0: "not look offscreen at timestamp"} time till target: time spent to first start looking at the target object (ms) {-1: "never look at the target"} looks target: csv: time spent to look at the target object (ms);csv_timestamp: look at the target or not at current timestamp (1 or 0) look1,2,3: time spent look at each object (ms) location 123X, 123Y: location of each box (location of the three boxes for a given sequence were chosen randomly, but remained static throughout the sequence) item123id: pop-up item ID (remained static throughout a sequence) event time: total time spent for the whole event (pop-up and go back) (ms) eyeposX,Y: eye position at current timestamp

"csv-surprisal-prob.csv" is an output file from Monkilock_Data_Processing.ipynb. Surprisal values for each event were calculated and added to the "csv-combined.csv". Here are descriptions of each additional column:

rt: time till target {-1: "never look at the target"}. In data analysis, we included data that have rt > 0. already_there: {NA: "never look at the target object"}. In data analysis, we included events that are not the first event in a sequence, are not repeats of the previous event, and already_there is not NA. looks_away: {TRUE: "the subject was looking away from the currently active object at this time point", FALSE: "the subject was not looking away from the currently active object at this time point"} prob: the probability of the occurrence of object surprisal: unigram surprisal value bisurprisal: transitional surprisal value std_surprisal: standardized unigram surprisal value std_bisurprisal: standardized transitional surprisal value binned_surprisal_means: the means of unigram surprisal values binned to three groups of evenly spaced intervals according to surprisal values. binned_bisurprisal_means: the means of transitional surprisal values binned to three groups of evenly spaced intervals according to surprisal values.

"csv-surprisal-prob_updated.csv" is a ready-for-analysis dataset generated by Analysis_Code_final.Rmd after standardizing controlled variables, changing data types for categorical variables for analysts, etc. "AllSeq.csv" includes event information of all 80 sequences

Empty Values in Datasets:

There is no missing value in the original dataset "csv-combined.csv". Missing values (marked as NA in datasets) happen in columns "prev_active", "next_active", "already_there", "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" in "csv-surprisal-prob.csv" and "csv-surprisal-prob_updated.csv". NAs in columns "prev_active" and "next_active" mean that the first or the last active object in the sequence/no active object before or after the currently active object in the sequence. When we analyzed the variable "already_there", we eliminated data that their "prev_active" variable is NA. NAs in column "already there" mean that the subject never looks at the target object in the current event. When we analyzed the variable "already there", we eliminated data that their "already_there" variable is NA. Missing values happen in columns "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" when it is the first event in the sequence and the transitional probability of the event cannot be computed because there's no event happening before in this sequence. When we fitted models for transitional statistics, we eliminated data that their "bisurprisal", "std_bisurprisal", and "sq_std_bisurprisal" are NAs.

Codes:

In "Monkilock_Data_Processing.ipynb", we processed raw fixation data of 5 macaques and explored the relationship between their fixation patterns and the "surprisal" of events in each sequence. We computed the following variables which are necessary for further analysis, modeling, and visualizations in this notebook (see above for details): active_item, pre_active, next_active, firstappear ,looks_blank, looks_offscreen, time till target, looks target, look1,2,3, prob, surprisal, bisurprisal, std_surprisal, std_bisurprisal, binned_surprisal_means, binned_bisurprisal_means. "Analysis_Code_final.Rmd" is the main scripts that we further processed the data, built models, and created visualizations for data. We evaluated the statistical significance of variables using mixed effect linear and logistic regressions with random intercepts. The raw regression models include standardized linear and quadratic surprisal terms as predictors. The controlled regression models include covariate factors, such as whether an object is a repeat, the distance between the current and previous pop up object, trial number. A generalized additive model (GAM) was used to visualize the relationship between the surprisal estimate from the computational model and the behavioral data. "helper-lib.R" includes helper functions used in Analysis_Code_final.Rmd

Coupled Model Intercomparison Project Phase 5 (CMIP5) University of...

registry.opendata.aws

Updated Mar 14, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

NOAA (2022). Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling Dataset [Dataset]. https://registry.opendata.aws/noaa-uwpd-cmip5/

Explore at:

Dataset updated

Mar 14, 2022

Dataset provided by

National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/

Area covered

Madison, Wisconsin

Description

The University of Wisconsin Probabilistic Downscaling (UWPD) is a statistically downscaled dataset based on the Coupled Model Intercomparison Project Phase 5 (CMIP5) climate models. UWPD consists of three variables, daily precipitation and maximum and minimum temperature. The spatial resolution is 0.1°x0.1° degree resolution for the United States and southern Canada east of the Rocky Mountains.

The downscaling methodology is not deterministic. Instead, to properly capture unexplained variability and extreme events, the methodology predicts a spatially and temporally varying Probability Density Function (PDF) for each variable. Statistics such as the mean, mean PDF and annual maximum statistics can be calculated directly from the daily PDF and these statistics are included in the dataset. In addition, “standard”, “raw” data is created by randomly sampling from the PDFs to create a “realization” of the local scale given the large-scale from the climate model. There are 3 realizations for temperature and 14 realizations for precipitation.

The directory structure of the data is as follows
[cmip_version]/[scenario]/[climate_model]/[ensemble_member]/
The realizations are as follows
prcp_[realization_number][year].nc temp[realization_number][year].nc
The time mean files averaged over certain year bounds are as follows
prcp_mean[year_bound_1][year_bound_2].nc temp_mean[year_bound_1][year_bound_2].nc
The time-mean Cumulative Distribution Function (CDF) files are as follows
prcp_cdf[year_bound_1][year_bound_2].nc temp_cdf[year_bound_1][year_bound_2].nc
The CDF of the annual maximum precipitation is given for each year in the record prcp_annual_max_cdf[start_year_of_scenario]_[end_year_of_scenario].nc

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788

Explore at:

bin, application/gzip, zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1419788

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

Replication Data for: Effects of Settlement into Ethnic Enclaves on...

search.dataone.org
dataverse.harvard.edu

Updated Nov 22, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Lajevardi, Nazita (2023). Replication Data for: Effects of Settlement into Ethnic Enclaves on Immigrant Voter Turnout [Dataset]. http://doi.org/10.7910/DVN/LQ1VPA

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/LQ1VPA

Dataset updated

Nov 22, 2023

Dataset provided by

Harvard Dataverse

Authors

Lajevardi, Nazita

Description

We note that we include only do files, and a log file of our work and not any raw data. This is because, as we note in the online appendix, we use individual level data from Swedish registers. The data material is located on an encrypted server to which we have to log in through a remote desktop application in order to perform all of our data analyses. Due to the sensitivity of the data, we are under contractual and ethical obligation not to distribute these data to others. For those researchers who want to replicate our results there are two ways to get access to the ad- ministrative data. The first way is to order the data directly from Statistics Sweden (SCB). Statis- tics Sweden presently requires that researchers obtain permission from the Swedish Ethical Review Board before data can be ordered (a description of how to order data from Statistics Sweden is available at: https://www.scb.se/en/services/guidance-for-researchers-and-universities/). We will also make available a complete list all of the variables that we ordered from Statistics Sweden for this project, together with the statistical code used for the analyses. The second way to replicate our analyses is to come to Sweden and reanalyze these data through the same remote server system that we used. Researchers interested in using this option should reach out to us prior to coming to Sweden so that we can apply for approval from the Ethical Review Board for the researcher to temporarily be added to our research team, which is mandatory in order to get access to the remote server system.

quora_qa_raw

huggingface.co

Updated Jun 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Javi Lau (2024). quora_qa_raw [Dataset]. https://huggingface.co/datasets/LxYxvv/quora_qa_raw

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 18, 2024

Authors

Javi Lau

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

QUORA_ONE_MANY_QA

This dataset is derived from quora.com questioning data. It is a question with multiple answers. The project provide gas for mnbvc.

  STATISTICS

Raw data size

100w 16G 200w 17G 300w 15G 400w 11G 500w 10G 600w 9G 700w 9G 800w 7.5G 900w 7G 1000w 6.5G Updating...

Children, Technology and Play (CTAP) Survey

figshare.com
zivahub.uct.ac.za

pdf

Updated Mar 8, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Dick Ng'ambi; Karin Murris (2020). Children, Technology and Play (CTAP) Survey [Dataset]. http://doi.org/10.25375/uct.11950107.v1

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.25375/uct.11950107.v1

Dataset updated

Mar 8, 2020

Dataset provided by

University of Cape Town

Authors

Dick Ng'ambi; Karin Murris

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The School of Education at the University of Cape Town (UCT) investigated children’s learning through digital play. The aim of the study was to explore the intersection between child play, technology, creativity and learning among children aged between 3 and 11 years. The study also identified skills and dispositions children develop through both digital and non-digital play. The data shared emerged from a survey of parents of children in the stated age group, with particular reference to the parents views on children's play practices, including time parents spent playing with their children, concerns parents had on time children spend playing on various technologies, types of play children in South Africa engaged in and the concerns of parents when children played with some electronic devices. The following data files are shared:SA - Survey - Children, Technology and Play (CTAP) - Google Forms.pdfDescriptive Stats 2020.1.9 -Children Technology and Play SURVEY.xlsxParent Survey RAW PUBLIC DATA 2020.2.29 - Children Technology and Play Project.xlsxParent Survey RAW PUBLIC DATA 2020.2.29 - Children Technology and Play Project.csvParent Survey REPORT DATA 2020.2.29 - Children Technology and Play Project.xlsxParent Survey REPORT DATA 2020.2.29 - Children Technology and Play Project.csvParent Survey RAW and REPORT DATA SYNTAX 2020.2.29 - Children Technology and Play Project.spsNOTE: This survey was adapted from Marsh, J. Stjerne Thomsen, B., Parry, B., Scott, F. Bishop, J.C., Bannister, C., Driscoll, A., Margary, T., Woodgate, A., (2019) Children, Technology and Play. UK Survey Questions. LEGO Foundation.

Expenditure and Consumption Survey, 2010 - West Bank and Gaza

dev.ihsn.org
catalog.ihsn.org
+1more

Updated Apr 25, 2019

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 2010 - West Bank and Gaza [Dataset]. https://dev.ihsn.org/nada/catalog/73912

Explore at:

Dataset updated

Apr 25, 2019

Dataset authored and provided by

Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/

Time period covered

2010 - 2011

Area covered

Gaza Strip, Gaza, West Bank

Description

Abstract

The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

Geographic coverage

The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.

Analysis unit

1- Household/families. 2- Individuals.

Universe

The survey covered all Palestinian households who are usually resident in the Palestinian Territory during 2010.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sample and Frame:

The sampling frame consists of all enumeration areas which were enumerated in 2007, each numeration area consists of buildings and housing units with average of about 120 households in it. These enumeration areas are used as primary sampling units PSUs in the first stage of the sampling selection.

Sample Design:

The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 192 enumeration areas. Second stage: selection of a systematic random sample of 24 households from each enumeration area selected in the first stage.

Note: in Jerusalem Governorate (J1), 13 enumeration areas were selected; then in the second phase, a group of households from each enumeration area were chosen using census-2007 method of delineation and enumeration. This method was adopted to ensure household response is to the maximum to comply with the percentage of non-response as set in the sample design.Enumeration areas were distributed to twelve months and the sample for each quarter covers sample strata (Governorate, locality type) Sample strata:

The population was divided by:

1- Governorate 2- Type of Locality (urban, rural, refugee camps)

Sample Size:

The calculated sample size for the Expenditure and Consumption Survey in 2010 is about 3,757 households, 2,574 households in West Bank and 1,183 households in Gaza Strip.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire consists of two main parts:

First: Survey's questionnaire

Part of the questionnaire is to be filled in during the visit at the beginning of the month, while the other part is to be filled in at the end of the month. The questionnaire includes:

Control sheet: Includes household’s identification data, date of visit, data on the fieldwork and data processing team, and summary of household’s members by gender.

Household roster: Includes demographic, social, and economic characteristics of household’s members.

Housing characteristics: Includes data like type of housing unit, number of rooms, value of rent, and connection of housing unit to basic services like water, electricity and sewage. In addition, data in this section includes source of energy used for cooking and heating, distance of housing unit from transportation, education, and health centers, and sources of income generation like ownership of farm land or animals.

Food and Non-Food Items: includes food and non-food items, and household record her expenditure for one month.

Durable Goods Schedule: Includes list of main goods like washing machine, refrigerator,TV.

Assistances and Poverty: Includes data about cash and in kind assistances (assistance value,assistance source), also collecting data about household situation, and the procedures to cover expenses.

Monthly and annual income: Data pertinent to household’s income from different sources is collected at the end of the registration period.

Second: List of goods

The classification of the list of goods is based on the recommendation of the United Nations for the SNA under the name Classification of Personal Consumption by purpose. The list includes 55 groups of expenditure and consumption where each is given a sequence number based on its importance to the household starting with food goods, clothing groups, housing, medical treatment, transportation and communication, and lastly durable goods. Each group consists of important goods. The total number of goods in all groups amounted to 667 items for goods and services. Groups from 1-21 includes goods pertinent to food, drinks and cigarettes. Group 22 includes goods that are home produced and consumed by the household. The groups 23-45 include all items except food, drinks and cigarettes. The groups 50-55 include durable goods. The data is collected based on different reference periods to represent expenditure during the whole year except for cars where data is collected for the last three years.

Registration form

The registration form includes instructions and examples on how to record consumption and expenditure items. The form includes columns: 1.Monetary: If the good is purchased, or in kind: if the item is self produced. 2.Title of the service of the good 3.Unit of measurement (kilogram, liter, number) 4. Quantity 5. Value

The pages of the registration form are colored differently for the weeks of the month. The footer for each page includes remarks that encourage households to participate in the survey. The following are instructions that illustrate the nature of the items that should be recorded: 1. Monetary expenditures during purchases 2. Purchases based on debts 3.Monetary gifts once presented 4. Interest at pay 5. Self produced food and goods once consumed 6. Food and merchandise from commercial project once consumed 7. Merchandises once received as a wage or part of a wage from the employer.

Cleaning operations

Raw Data

Data editing took place through a number of stages, including: 1. Office editing and coding 2. Data entry 3. Structure checking and completeness 4. Structural checking of SPSS data files

Harmonized Data

The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.
The harmonization process starts with cleaning all raw data files received from the Statistical Office.
Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.
A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.
A post-harmonization cleaning process is run on the data.
Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

Response rate

The survey sample consisted of 4,767 households, which includes 4,608 households of the original sample plus 159 households as an additional sample. A total of 3,757 households completed the interview: 2,574 households from the West Bank and 1,183 households in the Gaza Strip. Weights were modified to account for the non-response rate. The response rate in the Palestinian Territory 28.1% (82.4% in the West Bank was and 81.6% in Gaza Strip).

Sampling error estimates

The impact of errors on data quality was reduced to a minimum due to the high efficiency and outstanding selection, training, and performance of the fieldworkers. Procedures adopted during the fieldwork of the survey were considered a necessity to ensure the collection of accurate data, notably: 1) Develop schedules to conduct field visits to households during survey fieldwork. The objectives of the visits and the data collected on each visit were predetermined. 2) Fieldwork editing rules were applied during the data collection to ensure corrections were implemented before the end of fieldwork activities. 3) Fieldworkers were instructed to provide details in cases of extreme expenditure or consumption by the household. 4) Questions on income were postponed until the final visit at the end of the month. 5) Validation rules were embedded in the data processing systems, along with procedures to verify data entry and data edit.

Socio-Economic Survey, HSOCIOECOS 2018 - Palestine

erfdataportal.com

Updated Nov 5, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Palestinian Central Bureau of Statistics (2023). Socio-Economic Survey, HSOCIOECOS 2018 - Palestine [Dataset]. https://erfdataportal.com/index.php/catalog/281

Explore at:

Dataset updated

Nov 5, 2023

Dataset authored and provided by

Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/

Time period covered

2018

Area covered

Palestine

Description

Abstract

THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS

The Palestinian Central Bureau of Statistics (PCBS) carried out the Socio-Economic Survey 2018. The survey round covered a total sample of about 9926 households.

The main objective of collecting data on the socio-economic and its components, including demographic charachteristics, employment, unemployment, is to provide basic information on the size and structure of the Palestinian households, as well as other data on the status of housing and characteristics of individuals, the family and living conditions.

Geographic coverage

Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.

Kind of data

Sample survey data [ssd]

Sampling procedure

The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.

---> Target Population: It consists of all Palestinian households and individuals who habitually reside with their households in Palestine for the reference period of the survey, which is 2018, and there was a focus on individuals of the age group 18 years and over to complete an additional form for individuals in this category.

---> Sampling Frame: The sampling frame consists of a comprehensive sample selected from the Population, Housing and Establishments Census 2007: This comprehensive sample consists of geographical areas with an average of 124 households, and these are considered as enumeration areas used in the census and these units were used as primary sampling units (PSUs).

---> Sampling Size: The estimated sample size is 9926 households in 2018.

---> Sample Design The sample is two stage stratified cluster sample with three stages : First stage: a systematic random sample of 337 enumeration areas for the whole round was selected. Second stage: the same households that were visited in the previous survey session 2015, which are about 25 households from each enumeration area were visited in 2018. Households that changed their place of residence and address available in the previous database to another place to complete the questionnaire are tracked. Individuals from the previous session who were separated from their household and formed new households or joined new households. Third stage: a male or female individual was selected from all sample families (old and new) in the second stage of individuals 18 years and over using check tables to complete the form for individuals 18 years and over (quality of life model). Taking into account that the household whose number is even in the enumeration area sample, we choose a female from it, and the household whose number is odd, from which we choose a male.

---> Sample strata: The population was divided by: 1. The governorate (16 governorates in the West Bank, including Jerusalem J1 governorate, which the Israeli occupation annexed to it by force after its occupation of the West Bank in 1967 as Tabqa and the Gaza Strip). 2. Type of settlement (urban, rural, camp). 3. Area C (class C, non-C) as an implicit class.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire is the key tool for data collection. It must be conforming to the technical characteristics of fieldwork to allow for data processing and analysis. The survey questionnaire comprised the following parts: - Part one: Identification data. - Part two: Quality control - Part three: Data of households’ members and social data. - Part four: Housing unit data - Part five: Assistance and Coping Strategies Information - Part six: Expenditure and Consumption - Part seven: Food Variation and Facing Food Shortage - Part eight: Income - Part nine: Agricultural and economic activities. - Part ten: Freedom of mobility - In addition to a questionnaire for individuals (18 years old and above): Questions on suffering and life quality, assessment of health, education, administration (Ministry of the Interior) services and information technology.

Cleaning operations

---> Raw Data PCBS started collecting data since 27/8/2018 using the hand held devices in Palestine excluding Jerusalem in side boarders (J1) and Gaza Strip, the program used in HHD called Sql Server and Microsoft. Net which was developed by General Directorate of Information Systems. Using HHD reduced the data processing stages, the fieldworkers collect data and sending data directly to server then the project manager can withdrawal the data at any time he needs. In order to work in parallel with Gaza Strip and Jerusalem in side boarders (J1), an office program was developed using the same techniques by using the same database for the HHD.

---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

The survey sample consists of about 11,008 households of which 9,926 households completed the interview; whereas 5,898 households from the West Bank and 4,028 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in Palestine reached 90.2%.

Sampling error estimates

---> Sampling Errors Those errors result from studying part (sample) of the society and not all society units. Since the socio-economic conditions survey 2018 was conducted on a sample, sampling errors are expected to occur. To minimize sampling errors, a properly designed probability sample was used to calculate errors throughout the process. This means that for every unit of the society there is a probability to be selected in the sample. The variance was calculated to measure the impact on sample design for Palestine.

---> Non-Sampling Errors Non-Sampling errors are possible at all stages of the project, during data collection or processing. Those are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the fieldworkers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, as well as practical and theoretical training during the training course. Non-sampling errors in the survey resulted from the private data it collected and that some households considered this as interference in the very details of their private life. They refused to cooperate with data. Several methods were used to convince households provide answers and minimize non-response.

Data appraisal

Concept of data quality covers many aspects, starting from the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are seven dimensions of the statistical quality: relevance, accuracy, timeliness, accessibility, comparability, coherence, and completeness.

MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

repository.soilwise-he.eu
dataverse.harvard.edu
+1more

Facebook

Twitter

Click to copy link

Link copied

Cite

MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory [Dataset]. http://doi.org/10.7910/DVN/M4ZGXP

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/M4ZGXP

Description

MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

--------------------------------------------------------------------------------------
MSZSI is a data extraction tool for Google Earth Engine that aggregates time-series remote sensing information to multiple administrative levels using the FAO GAUL data layers. The code at the bottom of this page (metadata) can be pasted into the Google Earth Engine JavaScript code editor and ran at https://code.earthengine.google.com/.

Please refer to the associated publication:
Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624.
https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624

Input options:
[1] Country of interest
[2] Start and end year
[3] Start and end month
[4] Option to mask data to a specific land-use/land-cover type
[5] Land-use/land-cover type code from CGLS LULC
[6] Image collection for data aggregation
[7] Desired band from the image collection
[8] Statistics type for the zonal aggregations
[9] Statistic to use for annual aggregation
[10] Scaling options
[11] Export folder and label suffix

Output: Two CSVs containing zonal statistics for each of the FAO GAUL administrative level boundaries
Output fields: system:index, 0-ADM0_CODE, 0-ADM0_NAME, 0-ADM1_CODE, 0-ADM1_NAME, 0-ADMN_CODE, 0-ADMN_NAME, 1-AREA_PERCENT_LULC, 1-AREA_SQM_LULC, 1-AREA_SQM_ZONE, 2-X_2001, 2-X_2002, 2-X_2003, ..., 2-X_2020, .geo

PREPROCESSED DATA DOWNLOAD

The datasets available for download contain zonal statistics at 2 administrative levels (FAO GAUL levels 1 and 2). Select countries from Southeast Asia and Sub-Saharan Africa (Cambodia, Indonesia, Lao PDR, Myanmar, Philippines, Thailand, Vietnam, Burundi, Kenya, Malawi, Mozambique, Rwanda, Tanzania, Uganda, Zambia, Zimbabwe) are included in the current version, with plans to extend the dataset to contain global metrics. Each zip file is described below and two example NDVI tables are available for preview.

Key: [source, data, units, temporal range, aggregation, masking, zonal statistic, notes]

Currently available:
MSZSI-V2_V-NDVI-MEAN.tar: [NASA-MODIS, NDVI, index, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-DAY-MEAN.tar: [NASA-MODIS, LST Day, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-NIGHT-MEAN.tar: [NASA-MODIS, LST Night, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_R-PRECIP-SUM.tar: [UCSB-CHG-CHIRPS, Precipitation, mm, 2001–2020, annual sum, agriculture, mean, n/a]
MSZSI-V2_S-BDENS-MEAN.tar: [OpenLandMap, Bulk density, g/cm3, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-ORGC-MEAN.tar: [OpenLandMap, Organic carbon, g/kg, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-PH-MEAN.tar: [OpenLandMap, pH in H2O, pH, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-WATER-MEAN.tar: [OpenLandMap, Soil water, % at 33kPa, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SAND-MEAN.tar: [OpenLandMap, Sand, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SILT-MEAN.tar: [OpenLandMap, Silt, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-CLAY-MEAN.tar: [OpenLandMap, Clay, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_E-ELEV-MEAN.tar: [MERIT, [elevation, slope, flowacc, HAND], [m, degrees, km², m], static, n/a, agriculture, mean, n/a]

Coming soon
MSZSI-V2_C-STAX-MEAN.tar: [OpenLandMap, Soil taxonomy, category, static, n/a, agriculture, area sum, n/a]
MSZSI-V2_C-LULC-MEAN.tar: [CGLS-LC100-V3, LULC, category, 2015–2019, mode, none, area sum, n/a]

Data sources:

https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD13Q1

https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD11A2

https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_PENTAD

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_BULKDENS-FINEEARTH_USDA-4A1H_M_v02

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_ORGANIC-CARBON_USDA-6A1C_M_v02

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_PH-H2O_USDA-4C1A2A_M_v02

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_WATERCONTENT-33KPA_USDA-4B1C_M_v01

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_CLAY-WFRACTION_USDA-3A1A1A_M_v02

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_SAND-WFRACTION_USDA-3A1A1A_M_v02

https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_GRTGROUP_USDA-SOILTAX_C_v01

https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_Landcover_100m_Proba-V-C3_Global

https://developers.google.com/earth-engine/datasets/catalog/MERIT_Hydro_v1_0_1

https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level0

https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level1

https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level2

Project information:
SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes
http://seagul.info/; https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental
This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)

For an additional interactive visualization, visit: https://cartoscience.users.earthengine.app/view/maup-mapper-multi-scale-modis-ndvi

Google Earth Engine code

/*/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// MSZSI: Multi-Scale Zonal Statistics Inventory Authors: Brad G. Peter, Department of Geography, University of Alabama Joseph Messina, Department of Geography, University of Alabama Austin Raney, Department of Geography, University of Alabama Rodrigo E. Principe, AgriCircle AG Peilei Fan, Department of Geography, Environment, and Spatial Sciences, Michigan State University Citation: Peter, Brad; Messina, Joseph; Raney, Austin; Principe, Rodrigo; Fan, Peilei, 2021, 'MSZSI: Multi-Scale Zonal Statistics Inventory', https://doi.org/10.7910/DVN/YCUBXS, Harvard Dataverse, V# SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)

Raw data for meta-analysis of replications project

Household Expenditure and Income Survey 2008, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

GAPs Data Repository on Return: Guideline, Data Samples and Codebook

3_Processed_Data_Students_12-Tasks

Open Trade Statistics Database

Planning Department Project Application Review metrics

Household Expenditure and Income Survey, HEIS 2013 - Jordan

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Data from: Macaques preferentially attend to intermediately surprising...

Coupled Model Intercomparison Project Phase 5 (CMIP5) University of...

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

Replication Data for: Effects of Settlement into Ethnic Enclaves on...

quora_qa_raw

Children, Technology and Play (CTAP) Survey

Expenditure and Consumption Survey, 2010 - West Bank and Gaza

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sample and Frame:

Sample Design:

The population was divided by:

Sample Size:

Mode of data collection

Research instrument

Cleaning operations

Raw Data

Harmonized Data

Response rate

Sampling error estimates

Socio-Economic Survey, HSOCIOECOS 2018 - Palestine

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

Labor Force Survey, LFS 2021 - Palestine

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data from: Economic and Political Determinants of Terrorism in Selected...

Coronavirus (Covid-19) Data in the United States

BANGLADESH: Integrated Agricultural Productivity Project (IAPP)

Raw data for meta-analysis of replications project