Facebook
TwitterA random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Big data, with N × P dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order P*N(N−1)2. To circumvent this problem, typically the clustering technique is applied to a random sample drawn from the dataset; however, a weakness is that the structure of the dataset, particularly at the edges, is not necessarily maintained. We propose a new solution through the concept of “data nuggets”, which reduces a large dataset into a small collection of nuggets of data, each containing a center, weight, and scale parameter. The data nuggets are then input into algorithms that compute methods such as principal components analysis and clustering in a more computationally efficient manner. We show the consistency of the data nuggets based covariance estimator and apply the methodology of data nuggets to perform exploratory analysis of a flow cytometry dataset containing over one million observations using PCA and K-means clustering for weighted observations. Supplementary materials for this article are available online.
Facebook
TwitterDescription and PurposeThese data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2022):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethodsThe survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and LimitationsThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. This data is the weighted data provided by the ETC Institute, which is used in the final published PDF report.The 2022 Annual Community Survey report is available on data.tempe.gov. The individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Wydale HolmesContact E-Mail (author): wydale_holmes@tempe.govContact (maintainer): Wydale HolmesContact E-Mail (maintainer): wydale_holmes@tempe.govData Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context: This data set originates from a practice-relevant degradation process, which is representative for Prognostics and Health Management (PHM) applications. The observed degradation process is the clogging of filters when separating of solid particles from gas. A test bench is used for this purpose, which performs automated life testing of filter media by loading them. For testing, dust complying with ISO standard 12103-1 and with a known particle size distribution is employed. The employed filter media is made of randomly oriented non-woven fibre material. Further data sets are generated for various practice-relevant data situations which do not correspond to the ideal conditions of full data coverage. These data sets are uploaded to Kaggle by the user "Prognostics @ HSE" in a continuous process. In order to avoid the carryover between two data sets, a different configuration of the filter tests is used for each uploaded practice-relevant data situation, for example by selecting a different filter media.
Detailed specification: For more information about the general operation and the components used, see the provided description file Random Recording Condition Data Data Set.pdf
Given data situation: In order to implement a predictive maintenance policy, knowledge about the time of failure respectively about the remaining useful life (RUL) of the technical system is necessary. The time of failure or the RUL can be predicted on the basis of condition data that indicate the damage progression of a technical system over time. However, the collection of condition data in typical industrial PHM applications is often only possible in an incomplete manner. An example is the collection of data during defined test cycles with specific loads, carried at intervals. For instance, this approach is often used with machining centers, where test cycles are only carried out between finished machining jobs or work shifts. Due to different work pieces, the machining time varies and the test cycle with the recording of condition data is not performed equidistantly. This results in a data characteristic that is comparable to a random sample of continuously recorded condition data. Another example that may result in such a data characteristic comes from the effort to reduce data volumes when recording condition data. Attempts can be made to keep the amount of data with unchanged damage as small as possible. One possible measure is not to transmit and store the continuous sensor readings, but rather sections of them, which also leads to gaps in the data available for prognosis. In the present data set, the life cycle of filters or rather their condition data, represented by the differential pressure, is considered. Failure of the filter occurs when the differential pressure across the filter exceeds 600 Pa. The time until a filter failure occurs depends especially on the amount of dust supplied per time, which is constant within a run-to-failure cycle. The previously explained data characteristics are addressed by means of corresponding training and test data. The training data is structured as follows: A run-to-failure cycle contains n batches of data. The number n varies between the cycles and depends on the duration of the batches and the time interval between the individual batches. The duration and time interval of the batches are random variables. A data batch includes the sensor readings of differential pressure and flow rate for the filter, the start and end time of the batch, and RUL information related to the end time of the batch. The sensor readings of the differential pressure and flow rate are recorded at a constant sampling rate. Figure 6 shows an illustrative run-to-failure cycle with multiple batches. The test data are randomly right-censored. They are also made of batches with a random duration and time interval between the batches. For each batch contained, the start and end time are given, as well as the sensor readings within the batch. The RUL is not given for each batch but only for the last data point of the right-censored run-to-failure cycle.
Task: The aim is to predict the RUL of the censored filter test cycles given in the test data. In order to predict the RUL, training and test data are given, each consisting of 60 and 40 run-to-failure cycles. The test data contains random right-censored run-to-failure cycles and the respective RUL for the prediction task. The main challenge is to make the best use of the incompletely recorded training and test data to provide the most accurate prediction possible. Due to the detailed description of the setup and the various physical filter models described in literature, it is possible to support the actual data-driven models by integrating physical knowledge respectively models in the sense of theory-guided data science or informed machi...
Facebook
TwitterThese data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.
Facebook
TwitterThe document dataset covers the Enterprise Survey (ES) panel data collected in North Macedonia in 2009, 2013 and 2019.
Macedonia ES 2009 was conducted in 2008 and 2009, while Macedonia ES 2013 was conducted between November 2012 and May 2013, and North Macedonia ES 2019 was conducted between December 2018 and October 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms’ experiences and enterprises’ perception of the environment in which they operate.
National
Regions covered are selected based on the number of establishments, contribution to employment, and value added. In most cases these regions are metropolitan areas and reflect the largest centers of economic activity in a country.
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample for Macedonia 2009 ES, Macedonia 2013 ES and of 2019 North Macedonia ES were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Macedonia 2009 ES and for Macedonia 2013 ES, and in the Sampling Note for 2019 North Macedonia ES. Stratified random sampling was preferred over simple random sampling for several reasons:
a. To obtain unbiased estimates for different subdivisions of the population with some known level of precision. b. To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors. c. To make sure that the final total sample includes establishments from all different sectors and that it is not concentrated in one or two of industries/sizes/regions. d. To exploit the benefits of stratified sampling where population estimates, in most cases, will be more precise than using a simple random sampling method (i.e., lower standard errors, other things being equal.) e. Stratification may produce a smaller bound on the error of estimation than would be produced by a simple random sample of the same size. This result is particularly true if measurements within strata are homogeneous. f. The cost per observation in the survey may be reduced by stratification of the population elements into convenient groupings.
Three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in Appendix C of the North Macedonia 2019 ES Implementation Report and in Appendix E of the Macedonia 2013 Implementation Report.
Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 3.1 codes 15-37), Retail (ISIC 52), and Other Services (ISIC 45, 50, 51, 55, 60-64, 72).
As it is standard for the ES, the North Macedonia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Regional stratification for North Macedonia ES 2019 was done across three regions: Skopje; Eastern Macedonia comprising Northeastern, Eastern, Southeastern, and Vardar regions; and Western Macedonia comprising Polog, Southwestern and Pelagonia regions. For Macedonia 2013 ES, regional stratification was defined in 4 regions (city and the surrounding business area) throughout Macedonia. And for Macedonia ES 2009, regional stratification was defined in 4 regions which are Eastern, North- West & West, Skopje, and South.
Computer Assisted Personal Interview [capi]
Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies:
a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond (-8) as a different option from don’t know (-9).
b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response. The following graph shows non-response rates for the sales variable, d2, by sector. Please, note that for this specific question, refusals were not separately identified from “Don’t know” responses.
Facebook
TwitterChina Living Standards Survey (LSS) consists of one household survey and one community (village) survey, conducted in Hebei and Liaoning Provinces (northern and northeast China) in July 1995 and July 1997 respectively. Five villages from each three sample counties of each province were selected (six were selected in Liaoyang County of Liaoning Province because of administrative area change). About 880 farm households were selected from total thirty-one sample villages for the household survey. The same thirty-one villages formed the samples of community survey. This document provides information on the content of different questionnaires, the survey design and implementation, data processing activities, and the different available data sets.
Regional
Households
Sample survey data [ssd]
The China LSS sample is not a rigorous random sample drawn from a well-defined population. Instead it is only a rough approximation of the rural population in Hebei and Liaoning provinces in North-eastern China. The reason for this is that part of the motivation for the survey was to compare the current conditions with conditions that existed in Hebei and Liaoning in the 1930's. Because of this, three counties in Hebei and three counties in Liaoning were selected as "primary sampling units" because data had been collected from those six counties by the Japanese occupation government in the 1930's. Within each of these six counties (xian) five villages (cun) were selected, for an overall total of 30 villages (in fact, an administrative change in one village led to 31 villages being selected). In each county a "main village" was selected that was in fact a village that had been surveyed in the 1930s. Because of the interest in these villages 50 households were selected from each of these six villages (one for each of the six counties). In addition, four other villages were selected in each county. These other villages were not drawn randomly but were selected so as to "represent" variation within the county. Within each of these villages 20 households were selected for interviews. Thus, the intended sample size was 780 households, 130 from each county. Unlike county and village selection, the selection of households within each village was done according to standard sample selection procedures. In each village, a list of all households in the village was obtained from village leaders. An "interval" was calculated as the number of the households in the village divided by the number of households desired for the sample (50 for main villages and 20 for other villages). For the list of households, a random number was drawn between 1 and the interval number. This was used as a starting point. The interval was then added to this number to get a second number, then the interval was added to this second number to get a third number, and so on. The set of numbers produced were the numbers used to select the households, in terms of their order on the list. In fact, the number of households in the sample is 785, as opposed to 780. Most of this difference is due to a village in which 24 households were interviewed, as opposed to the goal of 20 households
Face-to-face [f2f]
(a) DATA ENTRY All responses obtained from the household interviews were recorded in the household questionnaires. These were then entered into the computer, in the field, using data entry programs written in BASIC. The data produced by the data entry program were in the form of household files, i.e. one data file for all of the data in one household/community questionnaire. Thus, for the household there were about 880 data files. These data files were processed at the University of Toronto and the World Bank to produce datasets in statistical software formats, each of which contained information for all households for a subset of variables. The subset of variables chosen corresponded to data entry screens, so these files are hereafter referred to as "screen files". For the household survey component 66 data files were created. Members of the survey team checked and corrected data by checking the questionnaires for original recorded information. We would like to emphasize that correction here refers to checking questionnaires, in case of errors in skip patterns, incorrect values, or outlying values, and changing values if and only if data in the computer were different from those in the questionnaires. The personnel in charge of data preparation were given specific instructions not to change data even if values in the questionnaires were clearly incorrect. We have no reason to believe that these instructions were not followed, and every reason to believe that the data resulting from these checks and corrections are accurate and of the highest quality possible.
(b) DATA EDITING The screen files were then brought to World Bank headquarters in Washington, D.C. and uploaded to a mainframe computer, where they were converted to "standard" LSMS formats by merging datasets to produce separate datasets for each section with variable names corresponding to the questionnaires. In some cases, this has meant a single dataset for a section, while in others it has meant retaining "screen" datasets with just the variable names changed. Linking Parts of the Household Survey Each household has a unique identification number which is contained in the variable HID. Values for this variable range from 10101 to 60520. The first number is the code for the six counties in which data were collected, the second and third digits are for the villages within each county. Finally, the last two digits of HID contain the household number within the village. Data for households from different parts of the survey can be merged by using the HID variable which appears in each dataset of the household survey. To link information for an individual use should be made of both the household identification number, HID, and the person identification number, PID. A child in the household can be linked to the parents, if the parents are household members, through the parents' id codes in Section 01B. For parents who are not in the household, information is collected on the parent's schooling, main occupation and whether he/she is currently alive. Household members can be linked with their non-resident children through the parents' id codes in Section 01C. Linking the Household to the Community Data The community data have a somewhat different set of identifying variables than the household data. Each community dataset has four identifying variables: province (code 7 for Hebei and code 8 for Liaoning); county (six two digit codes, of which the first digit represents province and the second digit represents the three counties in each province); township (3 digit code, first digit is county, second digit is county and third digit is township); and village (4 digit code, first digit is county, second digit is county, third digit is township, and third fourth digit is village). Constructed Data Set Researchers at the World Bank and the University of Toronto have created a data set with information on annual household expenditures, region codes, etc. This constructed data set is made available for general use with the understanding that the description below is the only documentation that will be provided. Any manipulation of the data requires assumptions to be made and, as much as possible, those assumptions are explained below. Except where noted, the data set has been created using only the original (raw) data sets. A researcher could construct a somewhat different data set by incorporating different assumptions. Aggregate Expenditure, TOTEXP. The dataset TOTEXP contains variables for total household annual expenditures (for the year 1994) and variables for the different components of total household expenditures: food expenditures, non-food expenditures, use value of consumer durables, etc. These, along with the algorithm used to calculate household expenditures are detailed in Appendix D. The dataset also contains the variable HID, which can be used to match this dataset to the household level data set. Note that all of the expenditure variables are totals for the household. That is, they are not in per capita terms. Researchers will have to divide these variables by household size to get per capita numbers. The household size variable is included in the data set.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterSurvey based Harmonized Indicators (SHIP) files are harmonized data files from household surveys that are conducted by countries in Africa. To ensure the quality and transparency of the data, it is critical to document the procedures of compiling consumption aggregation and other indicators so that the results can be duplicated with ease. This process enables consistency and continuity that make temporal and cross-country comparisons consistent and more reliable.
Four harmonized data files are prepared for each survey to generate a set of harmonized variables that have the same variable names. Invariably, in each survey, questions are asked in a slightly different way, which poses challenges on consistent definition of harmonized variables. The harmonized household survey data present the best available variables with harmonized definitions, but not identical variables. The four harmonized data files are
a) Individual level file (Labor force indicators in a separate file): This file has information on basic characteristics of individuals such as age and sex, literacy, education, health, anthropometry and child survival. b) Labor force file: This file has information on labor force including employment/unemployment, earnings, sectors of employment, etc. c) Household level file: This file has information on household expenditure, household head characteristics (age and sex, level of education, employment), housing amenities, assets, and access to infrastructure and services. d) Household Expenditure file: This file has consumption/expenditure aggregates by consumption groups according to Purpose (COICOP) of Household Consumption of the UN.
National
The survey covered all de jure household members (usual residents).
Sample survey data [ssd]
Sampling Frame and Units As in all probability sample surveys, it is important that each sampling unit in the surveyed population has a known, non-zero probability of selection. To achieve this, there has to be an appropriate list, or sampling frame of the primary sampling units (PSUs).The universe defined for the GLSS 5 is the population living within private households in Ghana. The institutional population (such as schools, hospitals etc), which represents a very small percentage in the 2000 Population and Housing Census (PHC), is excluded from the frame for the GLSS 5.
The Ghana Statistical Service (GSS) maintains a complete list of census EAs, together with their respective population and number of households as well as maps, with well defined boundaries, of the EAs. . This information was used as the sampling frame for the GLSS 5. Specifically, the EAs were defined as the primary sampling units (PSUs), while the households within each EA constituted the secondary sampling units (SSUs).
Stratification In order to take advantage of possible gains in precision and reliability of the survey estimates from stratification, the EAs were first stratified into the ten administrative regions. Within each region, the EAs were further sub-divided according to their rural and urban areas of location. The EAs were also classified according to ecological zones and inclusion of Accra (GAMA) so that the survey results could be presented according to the three ecological zones, namely 1) Coastal, 2) Forest, and 3) Northern Savannah, and for Accra.
Sample size and allocation The number and allocation of sample EAs for the GLSS 5 depend on the type of estimates to be obtained from the survey and the corresponding precision required. It was decided to select a total sample of around 8000 households nationwide.
To ensure adequate numbers of complete interviews that will allow for reliable estimates at the various domains of interest, the GLSS 5 sample was designed to ensure that at least 400 households were selected from each region.
A two-stage stratified random sampling design was adopted. Initially, a total sample of 550 EAs was considered at the first stage of sampling, followed by a fixed take of 15 households per EA. The distribution of the selected EAs into the ten regions or strata was based on proportionate allocation using the population.
For example, the number of selected EAs allocated to the Western Region was obtained as: 1924577/18912079*550 = 56
Under this sampling scheme, it was observed that the 400 households minimum requirement per region could be achieved in all the regions but not the Upper West Region. The proportionate allocation formula assigned only 17 EAs out of the 550 EAs nationwide and selecting 15 households per EA would have yielded only 255 households for the region. In order to surmount this problem, two options were considered: retaining the 17 EAs in the Upper West Region and increasing the number of selected households per EA from 15 to about 25, or increasing the number of selected EAs in the region from 17 to 27 and retaining the second stage sample of 15 households per EA.
The second option was adopted in view of the fact that it was more likely to provide smaller sampling errors for the separate domains of analysis. Based on this, the number of EAs in Upper East and the Upper West were adjusted from 27 and 17 to 40 and 34 respectively, bringing the total number of EAs to 580 and the number of households to 8,700.
A complete household listing exercise was carried out between May and June 2005 in all the selected EAs to provide the sampling frame for the second stage selection of households. At the second stage of sampling, a fixed number of 15 households per EA was selected in all the regions. In addition, five households per EA were selected as replacement samples.The overall sample size therefore came to 8,700 households nationwide.
Face-to-face [f2f]
Facebook
TwitterThe documented dataset covers Enterprise Survey (ES) panel data collected in Peru in 2006, 2010 and 2017, as part of the Enterprise Survey initiative of the World Bank. An Indicator Survey is similar to an Enterprise Survey; it is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.
The objective of the 2006-2017 Enterprise Survey is to obtain feedback from enterprises in client countries on the state of the private sector as well as to build a panel of enterprise data that will make it possible to track changes in the business environment over time and allow, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the Indicator Survey data provides information on the constraints to private sector growth and is used to create statistically significant business environment indicators that are comparable across countries.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample for the 2006-2017 Peru Enterprise Survey (ES) was selected using stratified random sampling, following the methodology explained in the Sampling Manual. Stratified random sampling was preferred over simple random sampling for several reasons: - To obtain unbiased estimates for different subdivisions of the population with some known level of precision. - To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors (group D), construction (group F), services (groups G and H), and transport, storage, and communications (group I). Groups are defined following ISIC revision 3.1. Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, excluding sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors. - To make sure that the final total sample includes establishments from all different sectors and that it is not concentrated in one or two of industries/sizes/regions. - To exploit the benefits of stratified sampling where population estimates, in most cases, will be more precise than using a simple random sampling method (i.e., lower standard errors, other things being equal.)
Three levels of stratification were used in every country: industry, establishment size, and region.
Industry stratification was designed in the following way: In small economies the population was stratified into 3 manufacturing industries, one services industry - retail-, and one residual sector as defined in the sampling manual. Each industry had a target of 120 interviews. In middle size economies the population was stratified into 4 manufacturing industries, 2 services industries -retail and IT-, and one residual sector. For the manufacturing industries sample sizes were inflated by 25% to account for potential non-response in the financing data.
For the Peru ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposed, the number of employees was defined on the basis of reported permanent full-time workers. This resulted in some difficulties in certain countries where seasonal/casual/part-time labor is common.
Face-to-face [f2f]
The current survey instruments are available: - Core Questionnaire + Manufacturing Module [ISIC Rev.3.1: 15-37] - Core Questionnaire + Retail Module [ISIC Rev.3.1: 52] - Core Questionnaire [ISIC Rev.3.1: 45, 50, 51, 55, 60-64, 72] - Screener Questionnaire.
The "Core Questionnaire" is the heart of the Enterprise Survey and contains the survey questions asked of all firms across the world. There are also two other survey instruments - the "Core Questionnaire + Manufacturing Module" and the "Core Questionnaire + Retail Module." The survey is fielded via three instruments in order to not ask questions that are irrelevant to specific types of firms, e.g. a question that relates to production and nonproduction workers should not be asked of a retail firm. In addition to questions that are asked across countries, all surveys are customized and contain country-specific questions. An example of customization would be including tourism-related questions that are asked in certain countries when tourism is an existing or potential sector of economic growth.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies:
a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond (-8) as a different option from don’t know (-9).
b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response. The following graph shows non-response rates for the sales variable, d2, by sector. Please, note that for this specific question, refusals were not separately identified from “Don’t know” responses.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals; whenever this was done, strict rules were followed to ensure replacements were randomly selected within the same stratum. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
Facebook
TwitterThis is an integration of 10 independent multi-country, multi-region, multi-cultural social surveys fielded by Gallup International between 2000 and 2013. The integrated data file contains responses from 535,159 adults living in 103 countries. In total, the harmonization project combined 571 social surveys.
These data have value in a number of longitudinal multi-country, multi-regional, and multi-cultural (L3M) research designs. Understood as independent, though non-random, L3M samples containing a number of multiple indicator ASQ (ask same questions) and ADQ (ask different questions) measures of human development, the environment, international relations, gender equality, security, international organizations, and democracy, to name a few [see full list below].
The data can be used for exploratory and descriptive analysis, with greatest utility at low levels of resolution (e.g. nation-states, supranational groupings). Level of resolution in analysis of these data should be sufficiently low to approximate confidence intervals.
These data can be used for teaching 3M methods, including data harmonization in L3M, 3M research design, survey design, 3M measurement invariance, analysis, and visualization, and reporting. Opportunities to teach about para data, meta data, and data management in L3M designs.
The country units are an unbalanced panel derived from non-probability samples of countries and respondents> Panels (countries) have left and right censorship and are thusly unbalanced. This design limitation can be overcome to the extent that VOTP panels are harmonized with public measurements from other 3M surveys to establish balance in terms of panels and occasions of measurement. Should L3M harmonization occur, these data can be assigned confidence weights to reflect the amount of error in these surveys.
Pooled public opinion surveys (country means), when combine with higher quality country measurements of the same concepts (ASQ, ADQ), can be leveraged to increase the statistical power of pooled publics opinion research designs (multiple L3M datasets)…that is, in studies of public, rather than personal, beliefs.
The Gallup Voice of the People survey data are based on uncertain sampling methods based on underspecified methods. Country sampling is non-random. The sampling method appears be primarily probability and quota sampling, with occasional oversample of urban populations in difficult to survey populations. The sampling units (countries and individuals) are poorly defined, suggesting these data have more value in research designs calling for independent samples replication and repeated-measures frameworks.
**The Voice of the People Survey Series is WIN/Gallup International Association's End of Year survey and is a global study that collects the public's view on the challenges that the world faces today. Ongoing since 1977, the purpose of WIN/Gallup International's End of Year survey is to provide a platform for respondents to speak out concerning government and corporate policies. The Voice of the People, End of Year Surveys for 2012, fielded June 2012 to February 2013, were conducted in 56 countries to solicit public opinion on social and political issues. Respondents were asked whether their country was governed by the will of the people, as well as their attitudes about their society. Additional questions addressed respondents' living conditions and feelings of safety around their living area, as well as personal happiness. Respondents' opinions were also gathered in relation to business development and their views on the effectiveness of the World Health Organization. Respondents were also surveyed on ownership and use of mobile devices. Demographic information includes sex, age, income, education level, employment status, and type of living area.
Facebook
TwitterThis survey was conducted in Hungary between February 2013 and August 2013 as part of the fifth round of the Business Environment and Enterprise Performance Survey (BEEPS V), a joint initiative of the World Bank Group and the European Bank for Reconstruction and Development. The objective of the survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.
Data from 310 establishments was analyzed. Stratified random sampling was used to select the surveyed businesses.
The survey topics include firm characteristics, information about sales and suppliers, competition, infrastructure services, judiciary and law enforcement collaboration, security, government policies, laws and regulations, financing, overall business environment, bribery, capacity utilization, performance and investment activities, and workforce composition.
In 2011, the innovation module was added to the standard set of Enterprise Surveys questionnaires to examine in detail how introduction of new products and practices influence firms' performance and management.
National
The primary sampling unit of the study is an establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample was selected using stratified random sampling. Three levels of stratification were used in this country: industry, establishment size, and region.
Industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).
Size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not common practice, apart from the construction and agriculture sectors which are not included in the survey.
Regional stratification was defined in 3 regions (city and the surrounding business area) throughout Hungary.
The database from Hungarian Central Statistical Office was used as the frame for the selection of a sample with the aim of obtaining interviews at 270 establishments with five or more employees.
Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 9.2% (102 out of 1,106 establishments).
In the dataset, the variables a2 (sampling region), a6a (sampling establishment's size), and a4a (sampling sector) contain the establishment's classification into the strata chosen for each country using information from the sample frame. Variable a4a is coded using ISIC Rev 3.1 codes for the chosen industries for stratification. These codes include most manufacturing industries (15 to 37), retail (52), and (45, 50, 51, 55, 60-64, 72) for other services.
Face-to-face [f2f]
Three different versions of the questionnaire were used. The basic questionnaire, the Core Module, includes all common questions asked to all establishments from all sectors. The second expanded variation, the Manufacturing Questionnaire, is built upon the Core Module and adds some specific questions relevant to manufacturing sectors. The third expanded variation, the Retail Questionnaire, is also built upon the Core Module and adds to the core specific questions.
The innovation module was added to the standard set of Enterprise Surveys questionnaires to examine how introduction of new products and practices influence firms' performance and management.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether, while the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as a different option from don’t know. b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
The number of realized interviews per contacted establishment was 0.26. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 0.28.
Facebook
TwitterThe documented dataset covers Enterprise Survey (ES) panel data collected in Argentina in 2006, 2010 and 2017, as part of the Enterprise Survey initiative of the World Bank. An Indicator Survey is similar to an Enterprise Survey; it is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.
The objective of the 2006-2017 Enterprise Survey is to obtain feedback from enterprises in client countries on the state of the private sector as well as to build a panel of enterprise data that will make it possible to track changes in the business environment over time and allow, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the Indicator Survey data provides information on the constraints to private sector growth and is used to create statistically significant business environment indicators that are comparable across countries.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample for the 2006-2017 Argentina Enterprise Survey (ES) was selected using stratified random sampling, following the methodology explained in the Sampling Manual. Stratified random sampling was preferred over simple random sampling for several reasons: - To obtain unbiased estimates for different subdivisions of the population with some known level of precision. - To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors (group D), construction (group F), services (groups G and H), and transport, storage, and communications (group I). Groups are defined following ISIC revision 3.1. Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, excluding sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors. - To make sure that the final total sample includes establishments from all different sectors and that it is not concentrated in one or two of industries/sizes/regions. - To exploit the benefits of stratified sampling where population estimates, in most cases, will be more precise than using a simple random sampling method (i.e., lower standard errors, other things being equal.)
Three levels of stratification were used in every country: industry, establishment size, and region.
Industry stratification was designed in the following way: In small economies the population was stratified into 3 manufacturing industries, one services industry - retail-, and one residual sector as defined in the sampling manual. Each industry had a target of 120 interviews. In middle size economies the population was stratified into 4 manufacturing industries, 2 services industries -retail and IT-, and one residual sector. For the manufacturing industries sample sizes were inflated by 25% to account for potential non-response in the financing data.
For the Argentina ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposed, the number of employees was defined on the basis of reported permanent full-time workers. This resulted in some difficulties in certain countries where seasonal/casual/part-time labor is common.
Face-to-face [f2f]
The current survey instruments are available: - Core Questionnaire + Manufacturing Module [ISIC Rev.3.1: 15-37] - Core Questionnaire + Retail Module [ISIC Rev.3.1: 52] - Core Questionnaire [ISIC Rev.3.1: 45, 50, 51, 55, 60-64, 72] - Screener Questionnaire.
The "Core Questionnaire" is the heart of the Enterprise Survey and contains the survey questions asked of all firms across the world. There are also two other survey instruments - the "Core Questionnaire + Manufacturing Module" and the "Core Questionnaire + Retail Module." The survey is fielded via three instruments in order to not ask questions that are irrelevant to specific types of firms, e.g. a question that relates to production and nonproduction workers should not be asked of a retail firm. In addition to questions that are asked across countries, all surveys are customized and contain country-specific questions. An example of customization would be including tourism-related questions that are asked in certain countries when tourism is an existing or potential sector of economic growth.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies:
a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond (-8) as a different option from don't know (-9).
b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response. The following graph shows non-response rates for the sales variable, d2, by sector. Please, note that for this specific question, refusals were not separately identified from "Don't know" responses.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals; whenever this was done, strict rules were followed to ensure replacements were randomly selected within the same stratum. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
Facebook
TwitterThe documented dataset covers Enterprise Survey (ES) panel data collected in Liberia in 2009 and 2017, as part of the Enterprise Survey initiative of the World Bank. An Indicator Survey is similar to an Enterprise Survey; it is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.
The objective of the 2009-2017 Enterprise Survey is to obtain feedback from enterprises in client countries on the state of the private sector as well as to build a panel of enterprise data that will make it possible to track changes in the business environment over time and allow, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the Indicator Survey data provides information on the constraints to private sector growth and is used to create statistically significant business environment indicators that are comparable across countries.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities sectors.
Sample survey data [ssd]
The sample for the 2009-2017 Liberia Enterprise Survey (ES) was selected using stratified random sampling, following the methodology explained in the Sampling Note. Stratified random was preferred over simple random sampling for several reasons: - To obtain unbiased estimates for different subdivisions of the population with some known level of precision. - To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except subsector 72, IT, which was added to the population under study), and all public or utilities sectors.
The cost per observation in the survey may be reduced by stratification of the population elements into convenient groupings.
Three levels of stratification were used in this country: industry, establishment size, and region. Industry stratification was designed as follows: the universe was stratified as into manufacturing and services industries. Manufacturing (ISIC Rev. 3.1 codes 15 - 37), and Services (ISIC codes 45, 50-52, 55, 60-64, and 72).
For the Liberia ES, size stratification was defined as follows: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Regional stratification for the Liberia ES was done across three regions: Montserrado, Margibi, and Nimba.
Face-to-face [f2f]
The current survey instruments are available: - Services and Manufacturing Questionnaire - Screener Questionnaire.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
There was a high response rate especially as a result of positive attitude towards the international community in collaboration with the government in their reconstruction efforts after a period of civil strife.There was also very positive attitude towards World Bank initiatives.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
Facebook
TwitterThis is a source dataset for a Let's Get Healthy California indicator at https://letsgethealthy.ca.gov/. Adult smoking prevalence in California, males and females aged 18+, starting in 2012. Caution must be used when comparing the percentages of smokers over time as the definition of ‘current smoker’ was broadened in 1996, and the survey methods were changed in 2012. Current cigarette smoking is defined as having smoked at least 100 cigarettes in lifetime and now smoking every day or some days. Due to the methodology change in 2012, the Centers for Disease Control and Prevention (CDC) recommend not conducting analyses where estimates from 1984 – 2011 are compared with analyses using the new methodology, beginning in 2012. This includes analyses examining trends and changes over time. (For more information, please see the narrative description.) The California Behavioral Risk Factor Surveillance System (BRFSS) is an on-going telephone survey of randomly selected adults, which collects information on a wide variety of health-related behaviors and preventive health practices related to the leading causes of death and disability such as cardiovascular disease, cancer, diabetes and injuries. Data are collected monthly from a random sample of the California population aged 18 years and older. The BRFSS is conducted by Public Health Survey Research Program of California State University, Sacramento under contract from CDPH. The survey has been conducted since 1984 by the California Department of Public Health in collaboration with the Centers for Disease Control and Prevention (CDC). In 2012, the survey methodology of the California BRFSS changed significantly so that the survey would be more representative of the general population. Several changes were implemented: 1) the survey became dual-frame, with both cell and landline random-digit dial components, 2) residents of college housing were eligible to complete the BRFSS, and 3) raking or iterative proportional fitting was used to calculate the survey weights. Due to these changes, estimates from 1984 – 2011 are not comparable to estimates from 2012 and beyond. Center for Disease Control and Policy (CDC) and recommend not conducting analyses where estimates from 1984 – 2011 are compared with analyses using the new methodology, beginning in 2012. This includes analyses examining trends and changes over time.Current cigarette smoking was defined as having smoked at least 100 cigarettes in lifetime and now smoking every day or some days. Prior to 1996, the definition of current cigarettes smoking was having smoked at least 100 cigarettes in lifetime and smoking now.
Facebook
TwitterThis dataset comes from the Annual Community Survey questions about satisfaction with the Quality of City Services. The Community Survey question that relates to this dataset is: “Quality of services provided by City of Tempe.” Respondents are asked to rate their satisfaction level using a scale of 1 to 5, where 1 means "Very Dissatisfied" and 5 means "Very Satisfied".The survey is mailed to a random sample of households in the City of Tempe and has a 95% confidence level.This page provides data for the Quality of City Services performance measure.The performance measure dashboard is available at 3.36 Quality of City Services.Additional InformationSource: Community Attitude Survey (Vendor: ETC Institute) Contact: Wydale HolmesContact E-Mail: wydale_holmes@tempe.govData Source Type: Excel and PDF ReportPreparation Method: Extracted from Annual Community Survey resultsPublish Frequency: AnnualPublish Method: ManualData Dictionary
Facebook
TwitterThis dataset comes from the Community Survey questions relating to the Community Health & Well-Being performance measure: "With “10” representing the best possible life for you and “0” representing the worst, how would you say you personally feel you stand at this time?" and "With “10” representing the best possible life for you and “0” representing the worst, how do you think you will stand about five years from now?" – the results of both scores are then used to assess a Cantril Scale which is a way of assessing general life satisfaction. As per the Cantril Self-Anchoring Striving Scale, the three categories of identification are as follows: Thriving – Respondents rate their current life as a 7 or higher AND their future life as an 8 or higher. Suffering – Respondents rate their current life negatively (0 to 4) AND their future life negatively (0 to 4). Struggling – Respondents who do not meet the criteria for Thriving or Suffering. The survey is mailed to a random sample of households in the City of Tempe and has a 95% confidence level. Note on Methodology Update: In 2025, the Cantril classification method was revised to align with Gallup’s official Life Evaluation Index methodology. This change affects only a small number of respondents whose answers did not fit cleanly into the previous custom definition of “Struggling,” which classified respondents who rated their current life moderately (5 or 6) or their future life moderately or negatively (0 to 7). Under the updated approach, respondents who previously fell outside that definition are now appropriately included in the Struggling category. The overall distribution of Thriving, Struggling, and Suffering changed only minimally, and the updated methodology has been applied consistently to all prior years.This page provides data for the Community Health and Well-Being performance measure.The performance measure dashboard is available at 3.34 Community Health and Well-Being.Data Dictionary Additional InformationSource: Community Attitude Survey (Vendor: ETC Institute)Contact: Amber AsburryContact email: amber_asburry@tempe.govPreparation Method: Survey results from two questions are calculated to create a Cantril Scale value that falls into the categories of Thriving, Struggling, and Suffering.Publish Frequency: AnnuallyPublish Method: Manual
Facebook
TwitterThis dataset comes from the Annual Community Survey question about satisfaction with the Value of Special Events. The Community Survey question relating to the Value of Special Events performance measure: "Please rate your level of satisfaction with each of the following: a) Value & benefits received by City from special events." Respondents are asked to rate their satisfaction level on a scale of 5 to 1, where 5 means "Very Satisfied" and 1 means "Very Dissatisfied" (without "don't know" as an option).The survey is mailed to a random sample of households in the City of Tempe and has a 95% confidence level.This page provides data for the Value of Special Events performance measure. The performance measure dashboard is available at 3.19 Value of Special Events.Additional InformationSource: Community Attitude Survey ( Vendor: ETC Institute)Contact: Wydale HolmesContact E-Mail: wydale_holmes@tempe.govData Source Type: Excel and PDFPreparation Method: Extracted from Annual Community Survey resultsPublish Frequency: AnnualPublish Method: Manual Data Dictionary
Facebook
TwitterA random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.