Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterTHE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE NATIONAL INSTITUTE OF STATISTICS (INS) - TUNISIA
The survey aims at estimating the demographic and educational characteristics of the population. It also calculates the economic indicators of the population such as the number of active individuals, the additional demand for jobs, the number of employed and their characteristics, the number of jobs created, the characteristics of the unemployed and the unemployment rate. Furthermore, this survey estimates these indicators on the household level and their living conditions.
The results of this survey were compared with the results of the second quarter of the national survey on population and employment 2011. It should also be noted that the National Institute of Statistics -Tunisia uses the unemployment definition and concepts adopted by the International Labour Organization. This definition implies that, the individual did not work during the week preceding the day of the interview, was looking for a job in the month preceding the date of the interview, is available to work within two weeks after the day of the interview.
In 2010, the National Institute of Statistics has adopted a strict ILO definition for unemployment, by conditioning that the person must perform effective approaches to search for a job in the month preceding the day of the interview.
Covering a representative sample at the national and regional level (governorates).
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE NATIONAL INSTITUTE OF STATISTICS - TUNISIA (INS)
The sample is drawn from the frame of the 2004 General Census of Population and Housing.
Face-to-face [f2f]
Three modules were designed for data collection:
Household Questionnaire (Module 1): Includes questions regarding household characteristics, living conditions, individuals and their demographic, educational and economic characteristics. This module also provides information on internal and external migration.
Active Employed Questionnaire (Module 2): Includes questions regarding the characteristics of the employed individuals as occupation, industry and wages for employees.
Active Unemployed Questionnaire (Module 3): Includes questions regarding the characteristics of the unemployed as unemployment duration, the last occupation, activity, and the number of days worked during the last year...etc.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Current Population Survey Food Security Supplement (CPS-FSS) is the source of national and State-level statistics on food insecurity used in USDA's annual reports on household food security. The CPS is a monthly labor force survey of about 50,000 households conducted by the Census Bureau for the Bureau of Labor Statistics. Once each year, after answering the labor force questions, the same households are asked a series of questions (the Food Security Supplement) about food security, food expenditures, and use of food and nutrition assistance programs. Food security data have been collected by the CPS-FSS each year since 1995. Four data sets that complement those available from the Census Bureau are available for download on the ERS website. These are available as ASCII uncompressed or zipped files. The purpose and appropriate use of these additional data files are described below: 1) CPS 1995 Revised Food Security Status data--This file provides household food security scores and food security status categories that are consistent with procedures and variable naming conventions introduced in 1996. This includes the "common screen" variables to facilitate comparisons of prevalence rates across years. This file must be matched to the 1995 CPS Food Security Supplement public-use data file. 2) CPS 1998 Children's and 30-day Food Security data--Subsequent to the release of the April 1999 CPS-FSS public-use data file, USDA developed two additional food security scales to describe aspects of food security conditions in interviewed households not captured by the 12-month household food security scale. This file provides three food security variables (categorical, raw score, and scale score) for each of these scales along with household identification variables to allow the user to match this supplementary data file to the CPS-FSS April 1998 data file. 3) CPS 1999 Children's and 30-day Food Security data--Subsequent to the release of the April 1999 CPS-FSS public-use data file, USDA developed two additional food security scales to describe aspects of food security conditions in interviewed households not captured by the 12-month household food security scale. This file provides three food security variables (categorical, raw score, and scale score) for each of these scales along with household identification variables to allow the user to match this supplementary data file to the CPS-FSS April 1999 data file. 4) CPS 2000 30-day Food Security data--Subsequent to the release of the September 2000 CPS-FSS public-use data file, USDA developed a revised 30-day CPS Food Security Scale. This file provides three food security variables (categorical, raw score, and scale score) for the 30-day scale along with household identification variables to allow the user to match this supplementary data file to the CPS-FSS September 2000 data file. Food security is measured at the household level in three categories: food secure, low food security and very low food security. Each category is measured by a total count and as a percent of the total population. Categories and measurements are broken down further based on the following demographic characteristics: household composition, race/ethnicity, metro/nonmetro area of residence, and geographic region. The food security scale includes questions about households and their ability to purchase enough food and balanced meals, questions about adult meals and their size, frequency skipped, weight lost, days gone without eating, questions about children meals, including diversity, balanced meals, size of meals, skipped meals and hunger. Questions are also asked about the use of public assistance and supplemental food assistance. The food security scale is 18 items that measure insecurity. A score of 0-2 means a house is food secure, from 3-7 indicates low food security, and 8-18 means very low food security. The scale and the data also report the frequency with which each item is experienced. Data are available as .dat files which may be processed in statistical software or through the United State Census Bureau's DataFerret http://dataferrett.census.gov/. Data from 2010 onwards is available below and online. Data from 1995-2009 must be accessed through DataFerrett. DataFerrett is a data analysis and extraction tool to customize federal, state, and local data to suit your requirements. Through DataFerrett, the user can develop an unlimited array of customized spreadsheets that are as versatile and complex as your usage demands then turn those spreadsheets into graphs and maps without any additional software. Resources in this dataset:Resource Title: December 2014 Food Security CPS Supplement. File Name: dec14pub.zipResource Title: December 2013 Food Security CPS Supplement. File Name: dec13pub.zipResource Title: December 2012 Food Security CPS Supplement. File Name: dec12pub.zipResource Title: December 2011 Food Security CPS Supplement. File Name: dec11pub.zipResource Title: December 2010 Food Security CPS Supplement. File Name: dec10pub.zip
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
By Dataquest [source]
This comprehensive dataset provides a historical record of Major League Baseball (MLB) games dating back to its inception. It offers an in-depth look into the game's significant aspects, encompassing detailed statistics, player performance information, and game outcomes across multiple seasons.
The MLB Game Logs dataset is a rich depository of data provided in the form of structured records. Sourced from Retrosheet, this dataset was initially presented in 127 distinct CSV files which have now been amalgamated into a single consolidated file for facilitating seamless analysis.
Starting from the fundamental game statistics like date and venue of matches, team names and IDs to capturing minute attributes such as day or night match distinction or completion info; all pertinent details are captured meticulously in this voluminous repository. More granular inputs like lengths of games measured via outs or attendance figures lend further richness to this set.
From a player performance perspective too the set is equally exhaustive housing data on hits home runs stealing bases sacrificing ventures extra-base hits runs batted in (RBIs), winning pitchers losing pitchers saving pitchers all listed alongside their respective players IDs for easy cross-referencing.
In addition to providing raw data,this dataset carries greatly-detailed column names grounded upon Retrosheets field explanations to proffer better clarity around each field contained within it's ambit thus ensuring users derive maximum value with minimal misinterpretation issues.
While comprehensive explanations about columns have been included within the data dictionary part of our files ,we recommend referring directly towards Retrosheet field explanation for complete details surround specific fields if so required.
As part and parcel respect for copyright terms belonging towards Retrosheet we declare :
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at www.retrosheet.org.We hope that enthusiasts, researchers, statisticians and other users find value in this rich resource of baseball history
This dataset is a treasure trove of history and statistics, offering detailed player and game information for MLB games from 1871 to 2016. This 'how-to-use' guide will thus be helpful for beginners or others who are not familiar with the dataset format.
Understand columns: Familiarize yourself with the numerous columns in this data set. Each column offers distinct information about each game, such as player performance, location of the match, number of spectators and more. It's okay if you do not understand everything right away.
Read documentation: You'll find a 'Retrosheet field explanation' link in the description provided above which explains each column in detail. Do make sure to go through it to get a better understanding.
Define your objective: Are you looking at predicting future game outcomes? Or trying to find patterns between attendance and team performance? Defined objectives will help focusing on relevant columns greatly reducing needless exploration efforts.
Cleaning Data: A few data points might have missing values or illegible entries; identifying them could help provide accurate insights from analysis.
Perform initial EDA (Exploratory Data Analysis): EDA is an approach that includes inspecting, cleaning, transforming, and visualizing raw datasets to inform our understanding of their underlying structure that might inform our selection or creation of statistical models later on down the line:
Histograms: Could provide frequency distributions for numeric variables.
Box plots: A good way of quickly visualizing where most data points lie.
Pivot tables: Aggregating specific groups can give you comprehensive insights into large sets.
Statistical Analysis & Machine Learning Models: With clear objectives & prepared dataset at hand trying various machine learning models for prediction like Logistic Regression model for binary outcome prediction (win/lose), Multiple Linear Regression model when outcome variable is numerical (score), decision trees for data segmentation and so on.
Visualize: It always help to build charts, graphs or tables for final insights visualization for others.
This datas...
Facebook
TwitterThe University of Wyoming (UW) King Air atmospheric boundary layer measurement missions were flown in 1987 during IFCs 3 and 4. This Raw Boundary Layer Fluxes data set contains parameters that describe the environment in which the flux data were collected and the flux data itself. The fluctuations in all variables were calculated with three different methods (the arithmetic means removed, the linear trends removed, or filtered with a high-pass recursive filter) prior to the eddy correlation calculations. This data set contains the data with the arithmetic means removed (i.e., RAW). All the flux measurements were obtained with the eddy-correlation method, wherein the aircraft is equipped with an inertial platform, accelerometers, and a gust probe for measurement of earth-relative gusts in the x, y, and z directions. Gusts in these dimensions are then correlated with each other for momentum fluxes and with fluctuations in other variables to obtain the various scalar fluxes, such as temperature (for sensible heat flux) and water vapor mixing ratio (for latent heat flux). The summary of data calculated from each aircraft pass includes various statistics, correlations, and fluxes calculated after the time series for each variable with the arithmetic means removed.
Facebook
TwitterThis dataset contains number of crimes filed under each category of the Indian Penal Code (IPC), number of victims of those crimes, and average crime rate. The data is presented separately by IPC category and sub-category. Data are available at the state/UT level for 2018.
● 7060_source_data.csv: The raw data from the source with original administrative dimensions. This dataset may have already been restructured by scraping PDFs, combining files, or pivoting tables to fit the proper tabular format used by NDAP, but the actual data values remain unchanged. ● NDAP_REPORT_7060.csv: The final standardised data using LGD geographic dimensions as seen on NPAP. ● 7060_metadata.csv: Variable-level metadata, including the following fields: ❖ VariableName: The full variable name as it appears in the data ❖ VariableCode: A unique variable code that is used as a short name for the variable during internal processing and can be used for simplicity if desired ❖ Type_Of_Variable: The classification of the column, whether it is a dimension or a variable (i.e. indicator) ❖ Unit_Of_Measure: ❖ Aggregation_Type: The default aggregation function to be used when aggregating each variable ❖ Weighing_Variable_Name: The weight assigned to each variable that is used by default when aggregating ❖ Weighing_Variable_ID: The weighting variable id corresponding to the weighing variable name ❖ Long_Description: A more descriptive definition of the variable ❖ Scaling_factor: Scaling factor from source ● 7060_KEYS.csv: The key which maps source administrative units to the standardised Local Government Directory (LGD) dimensions. This file also contains pre-calculated weights for every constituent unit mapped from the source dimensions into the LGD. You can interpret each row as describing what fraction of the source unit is mapped to a corresponding LGD unit. This file includes the following fields: ❖ src[Unit]Name: The administrative unit name as it appears in the source data. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit name as it appears in the LGD. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit code corresponding to the unit name in the LDG. ❖ Year: The year in which the data was collected or reported. Depending on the dataset, any other temporal variables may also be present (Quarter, Month, Calendar Day, etc.) ❖ Number_Of_Children: The number of LGD units associated with the mapping described by an individual row. Units from the source that have undergone a split will contain multiple children. ❖ Number_Of_Parents: The number of source units associated with the mapping described by an individual row. Units from the source that have undergone a merge will contain multiple parents. ❖ Weighing_Variables: Households, Population, Male Population, Female Population, Land Area (Total, Rural, and Urban versions of each). For each weighing variable there are the following associated fields: ■ Count: the total count of households, population, or land area mapped from the source unit to the LGD unit for that particular row (NumberOfHouseholds, TotalPopulation, LandArea). ■ Mapping_Error: the percentage error due to missing villages in the base data, meaning what fraction of the weighing variable is dropped because the microdata could not be mapped to the LGD. ■ Weighing_Ratio: the weighing ratio for that constituent match of source unit to LGD unit for each particular row. This is the fraction applied to the source data to achieve the LGD-standardised final data
Facebook
TwitterThe University of Wyoming (UW) King Air atmospheric boundary layer measurement missions were flown in 1987 during IFCs 3 and 4. This Raw Boundary Layer Fluxes data set contains parameters that describe the environment in which the flux data were collected and the flux data itself. The fluctuations in all variables were calculated with three different methods (the arithmetic means removed, the linear trends removed, or filtered with a high-pass recursive filter) prior to the eddy correlation calculations. This data set contains the data with the arithmetic means removed (i.e., RAW). All the flux measurements were obtained with the eddy-correlation method, wherein the aircraft is equipped with an inertial platform, accelerometers, and a gust probe for measurement of earth-relative gusts in the x, y, and z directions. Gusts in these dimensions are then correlated with each other for momentum fluxes and with fluctuations in other variables to obtain the various scalar fluxes, such as temperature (for sensible heat flux) and water vapor mixing ratio (for latent heat flux). The summary of data calculated from each aircraft pass includes various statistics, correlations, and fluxes calculated after the time series for each variable with the arithmetic means removed.
Facebook
TwitterCensuses are principal means of collecting basic population and housing statistics required for social and economic development, policy interventions, their implementation and evaluation.The census plays an essential role in public administration. The results are used to ensure: • equity in distribution of government services • distributing and allocating government funds among various regions and districts for education and health services • delineating electoral districts at national and local levels, and • measuring the impact of industrial development, to name a few The census also provides the benchmark for all surveys conducted by the national statistical office. Without the sampling frame derived from the census, the national statistical system would face difficulties in providing reliable official statistics for use by government and the public. Census also provides information on small areas and population groups with minimum sampling errors. This is important, for example, in planning the location of a school or clinic. Census information is also invaluable for use in the private sector for activities such as business planning and market analyses. The information is used as a benchmark in research and analysis.
Census 2011 was the third democratic census to be conducted in South Africa. Census 2011 specific objectives included: - To provide statistics on population, demographic, social, economic and housing characteristics; - To provide a base for the selection of a new sampling frame; - To provide data at lowest geographical level; and - To provide a primary base for the mid-year projections.
National
Households, Individuals
Census/enumeration data [cen]
Face-to-face [f2f]
About the Questionnaire : Much emphasis has been placed on the need for a population census to help government direct its development programmes, but less has been written about how the census questionnaire is compiled. The main focus of a population and housing census is to take stock and produce a total count of the population without omission or duplication. Another major focus is to be able to provide accurate demographic and socio-economic characteristics pertaining to each individual enumerated. Apart from individuals, the focus is on collecting accurate data on housing characteristics and services.A population and housing census provides data needed to facilitate informed decision-making as far as policy formulation and implementation are concerned, as well as to monitor and evaluate their programmes at the smallest area level possible. It is therefore important that Statistics South Africa collects statistical data that comply with the United Nations recommendations and other relevant stakeholder needs.
The United Nations underscores the following factors in determining the selection of topics to be investigated in population censuses: a) The needs of a broad range of data users in the country; b) Achievement of the maximum degree of international comparability, both within regions and on a worldwide basis; c) The probable willingness and ability of the public to give adequate information on the topics; and d) The total national resources available for conducting a census.
In addition, the UN stipulates that census-takers should avoid collecting information that is no longer required simply because it was traditionally collected in the past, but rather focus on key demographic, social and socio-economic variables.It becomes necessary, therefore, in consultation with a broad range of users of census data, to review periodically the topics traditionally investigated and to re-evaluate the need for the series to which they contribute, particularly in the light of new data needs and alternative data sources that may have become available for investigating topics formerly covered in the population census. It was against this background that Statistics South Africa conducted user consultations in 2008 after the release of some of the Community Survey products. However, some groundwork in relation to core questions recommended by all countries in Africa has been done. In line with users' meetings, the crucial demands of the Millennium Development Goals (MDGs) should also be met. It is also imperative that Stats SA meet the demands of the users that require small area data.
Accuracy of data depends on a well-designed questionnaire that is short and to the point. The interview to complete the questionnaire should not take longer than 18 minutes per household. Accuracy also depends on the diligence of the enumerator and honesty of the respondent.On the other hand, disadvantaged populations, owing to their small numbers, are best covered in the census and not in household sample surveys.Variables such as employment/unemployment, religion, income, and language are more accurately covered in household surveys than in censuses.Users'/stakeholders' input in terms of providing information in the planning phase of the census is crucial in making it a success. However, the information provided should be within the scope of the census.
Individual particulars Section A: Demographics Section B: Migration Section C: General Health and Functioning Section D: Parental Survival and Income Section E: Education Section F: Employment Section G: Fertility (Women 12-50 Years Listed) Section H: Housing, Household Goods and Services and Agricultural Activities Section I: Mortality in the Last 12 Months The Household Questionnaire is available in Afrikaans; English; isiZulu; IsiNdebele; Sepedi; SeSotho; SiSwati;Tshivenda;Xitsonga
The Transient and Tourist Hotel Questionnaire (English) is divided into the following sections:
Name, Age, Gender, Date of Birth, Marital Status, Population Group, Country of birth, Citizenship, Province.
The Questionnaire for Institutions (English) is divided into the following sections:
Particulars of the institution
Availability of piped water for the institution
Main source of water for domestic use
Main type of toilet facility
Type of energy/fuel used for cooking, heating and lighting at the institution
Disposal of refuse or rubbish
Asset ownership (TV, Radio, Landline telephone, Refrigerator, Internet facilities)
List of persons in the institution on census night (name, date of birth, sex, population group, marital status, barcode number)
The Post Enumeration Survey Questionnaire (English)
These questionnaires are provided as external resources.
Data editing and validation system The execution of each phase of Census operations introduces some form of errors in Census data. Despite quality assurance methodologies embedded in all the phases; data collection, data capturing (both manual and automated), coding, and editing, a number of errors creep in and distort the collected information. To promote consistency and improve on data quality, editing is a paramount phase in identifying and minimising errors such as invalid values, inconsistent entries or unknown/missing values. The editing process for Census 2011 was based on defined rules (specifications).
The editing of Census 2011 data involved a number of sequential processes: selection of members of the editing team, review of Census 2001 and 2007 Community Survey editing specifications, development of editing specifications for the Census 2011 pre-tests (2009 pilot and 2010 Dress Rehearsal), development of firewall editing specifications and finalisation of specifications for the main Census.
Editing team The Census 2011 editing team was drawn from various divisions of the organisation based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors. Census 2011 editing team was drawn from various divisions of the organization based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors.
The Census 2011 questionnaire was very complex, characterised by many sections, interlinked questions and skipping instructions. Editing of such complex, interlinked data items required application of a combination of editing techniques. Errors relating to structure were resolved using structural query language (SQL) in Oracle dataset. CSPro software was used to resolve content related errors. The strategy used for Census 2011 data editing was implementation of automated error detection and correction with minimal changes. Combinations of logical and dynamic imputation/editing were used. Logical imputations were preferred, and in many cases substantial effort was undertaken to deduce a consistent value based on the rest of the household’s information. To profile the extent of changes in the dataset and assess the effects of imputation, a set of imputation flags are included in the edited dataset. Imputation flags values include the following: 0 no imputation was performed; raw data were preserved 1 Logical editing was performed, raw data were blank 2 logical editing was performed, raw data were not blank 3 hot-deck imputation was performed, raw data were blank 4 hot-deck imputation was performed, raw data were not blank
Independent monitoring and evaluation of Census field activities Independent monitoring of the Census 2011 field activities was carried out by a team of 31 professionals and 381 Monitoring
Facebook
TwitterEach day, Backblaze takes a snapshot of each operational hard drive that includes basic hard drive information (e.g., capacity, failure) and S.M.A.R.T. statistics reported by each drive. This dataset contains data from the first two quarters in 2016.
This dataset contains basic hard drive information and 90 columns or raw and normalized values of 45 different S.M.A.R.T. statistics. Each row represents a daily snapshot of one hard drive.
date: Date in yyyy-mm-dd format
serial_number: Manufacturer-assigned serial number of the drive
model: Manufacturer-assigned model number of the drive
capacity_bytes: Drive capacity in bytes
failure: Contains a “0” if the drive is OK. Contains a “1” if this is the last day the drive was operational before failing.
90 variables that begin with 'smart': Raw and Normalized values for 45 different SMART stats as reported by the given drive
Some items to keep in mind as you process the data:
S.M.A.R.T. statistic can vary in meaning based on the manufacturer and model. It may be more informative to compare drives that are similar in model and manufacturer
Some S.M.A.R.T. columns can have out-of-bound values
When a drive fails, the 'failure' column is set to 1 on the day of failure, and starting the day after, the drive will be removed from the dataset. Each day, new drives are also added. This means that total number of drives each day may vary.
S.M.A.R.T. 9 is the number of hours a drive has been in service. To calculate a drive's age in days, divide this number by 24.
Given the hints above, below are a couple of questions to help you explore the dataset:
What is the median survival time of a hard drive? How does this differ by model/manufacturer?
Can you calculate the probability that a hard drive will fail given the hard drive information and statistics in the dataset?
The original collection of data can be found here. When using this data, Backblaze asks that you cite Backblaze as the source; you accept that you are solely responsible for how you use the data; and you do not sell this data to anyone.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note AST was not measured in male twins.
Facebook
TwitterThese Economic Estimates are National Statistics providing an estimate of the contribution of DCMS Sectors to the UK economy, measured by the number of businesses.
We have experimented with using a different, more timely data source to calculate this year’s Business Demographics statistics. As a result, they are not comparable with earlier DCMS Sector Business Demographics publications. More information is provided in these published documents and in the “Call for Feedback” section below.
These statistics cover the contributions of the following DCMS sectors to the UK economy;
Users should note that there is overlap between DCMS Sector definitions and that the Telecoms sector sits wholly within the Digital sector. Estimates are not available for the Civil Society sector, because they are not identifiable in the data source used for this release.
The release also includes estimates for the Audio Visual sector, which is not a DCMS Sector but is “adjacent” to it and includes some industries also common to DCMS Sectors.
A definition for each sector is available in the published data tables.
These statistics were first published on 8 December 2022
In this publication we have experimented with using a snapshot of the Inter-Departmental Business Register (IDBR) to generate estimates of DCMS Business Demographics, rather than the Annual Business Survey (ABS) as in previous releases. This has the advantage of being more timely, and commits to most tables included in previous Business Demographics publications. We have used the March 2019, March 2020, March 2021 and March 2022 snapshots from the ONS https://www.ons.gov.uk/businessindustryandtrade/business/activitysizeandlocation/datasets/ukbusinessactivitysizeandlocation">UK business: activity, size and location release rather than raw data from the IDBR.
We are looking for feedback on this approach. We particularly welcome views on:
Please contact evidence@dcms.gov.uk before Thursday 9th February 2023 with any feedback.
Hard copy feedback can be sent to:
DCMS Economic Estimates Team
Department for Digital, Culture, Media & Sport
4th Floor - area 4/34
100 Parliament Street
London
SW1A 2BQ
This release is published in accordance with the Code of Practice for Statistics (2018) produced by the UK Statistics Authority (UKSA). The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.
The accompanying pre-release access document lists ministers and officials who have received privileged early access to this release. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.
Responsible analyst: Eri Hutchinson
For any queries or feedback, please contact evidence@dcms.gov.uk.
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each individual result graph is associated with 4 different comma-separated files: (i) Raw—the (anonymised) raw data behind the means and standard deviations used for a particular result graph; (ii) Paired—the paired statistical significance results; (iii) Successive Male—the statistical significance results to compare successive groups (age and ability) for male runners; and (iv) Successive Female—the corresponding results for the statistical significance tests to compare successive groups (age and ability) of female runners. (ZIP)
Facebook
Twitterhttps://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The PCI and PCIe Image Capture Card market plays a crucial role in various industries, ranging from gaming and streaming to medical imaging and security surveillance. These cards are integral for converting raw video and imaging data from high-definition sources into digital formats that can be processed, recorded,
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
OpenStreetMap contains roughly 85.0 thousand km of roads in this region. Based on AI-mapped estimates, this is approximately 34 % of the total road length in the dataset region. The average age of data for the region is 4 years ( Last edited 8 days ago ) and 10% of roads were added or updated in the last 6 months. Read about what this summary means : indicators , metrics
This theme includes all OpenStreetMap features in this area matching ( Learn what tags means here ) :
tags['highway'] IS NOT NULL
Features may have these attributes:
This dataset is one of many "https://data.humdata.org/organization/hot">OpenStreetMap exports on HDX. See the Humanitarian OpenStreetMap Team website for more information.
Facebook
TwitterThe GIS layer "Census_sum_15" provides a standardized tool for examining spatial patterns in abundance and demographic trends of the southern sea otter (Enhydra lutris nereis), based on data collected during the spring 2015 range-wide census. The USGS range-wide sea otter census has been undertaken twice a year since 1982, once in May and once in October, using consistent methodology involving both ground-based and aerial-based counts. The spring census is considered more accurate than the fall count, and provides the primary basis for gauging population trends by State and Federal management agencies. This Shape file includes a series of summary statistics derived from the raw census data, including sea otter density (otters per square km of habitat), linear density (otters per km of coastline), relative pup abundance (ratio of pups to independent animals) and 5-year population trend (calculated as exponential rate of change). All statistics are calculated and plotted for small sections of habitat in order to illustrate local variation in these statistics across the entire mainland distribution of sea otters in California (as of 2015). Sea otter habitat is considered to extend offshore from the mean low tide line and out to the 60m isobath: this depth range includes over 99% of sea otter feeding dives, based on dive-depth data from radio tagged sea otters (Tinker et al 2006, 2007). Sea otter distribution in California (the mainland range) is considered to comprise this band of potential habitat stretching along the coast of California, and bounded to the north and south by range limits defined as "the points farthest from the range center at which 5 or more otters are counted within a 10km contiguous stretch of coastline (as measured along the 10m bathymetric contour) during the two most recent spring censuses, or at which these same criteria were met in the previous year". The polygon corresponding to the range definition was then sub-divided into onshore/offshore strips roughly 500 meters in width. The boundaries between these strips correspond to ATOS (As-The-Otter-Swims) points, which are arbitrary locations established approximately every 500 meters along a smoothed 5 fathom bathymetric contour (line) offshore of the State of California.
Facebook
TwitterThe GIS shapefile Census_sum_2019 provides a standardized tool for examining spatial patterns in abundance and demographic trends of the southern sea otter (Enhydra lutris nereis), based on data collected during the spring 2019 range-wide census. The USGS spring range-wide sea otter census has been undertaken each year since 1982, using consistent methodology involving both ground-based and aerial-based counts. The spring census provides the primary basis for gauging population trends by State and Federal management agencies. This shapefile includes a series of summary statistics derived from the raw census data, including sea otter density (otters per square kilometer of habitat), linear density (otters per kilometer of coastline), relative pup abundance (ratio of pups to independent animals) and 5-year population trend (calculated as exponential rate of change). All statistics are calculated and plotted for small sections of habitat in order to illustrate local variation in these statistics across the entire mainland distribution of sea otters in California (as of 2019). Sea otter habitat is considered to extend offshore from the mean low tide line and out to the 60 meter isobath: this depth range includes over 99 percent of sea otter feeding dives, based on dive-depth data from radio tagged sea otters (Tinker et al 2006, 2007). Sea otter distribution in California (the mainland range) is considered to comprise this band of potential habitat stretching along the coast of California, and bounded to the north and south by range limits defined by combining independent otters within a moving window of 10-kilometer stretches of coastline (as measured along the 10-meter bathymetric contour; 20 contiguous ATOS intervals each) and taking the northern and southern ATOS values, respectively, of the northernmost and southernmost stretches in which at least five otters were counted for at least 2 consecutive spring surveys during the last 3 years. The polygon corresponding to the range definition was then sub-divided into onshore/offshore strips roughly 500 meters in width. The boundaries between these strips correspond to ATOS (As-The-Otter-Swims) points, which are arbitrary locations established approximately every 500 meters along a smoothed 5 fathom bathymetric contour (line) offshore of the State of California. References: Tinker, M. T., Doak, D. F., Estes, J. A., Hatfield, B. B., Staedler, M. M. and Bodkin, J. L. (2006), INCORPORATING DIVERSE DATA AND REALISTIC COMPLEXITY INTO DEMOGRAPHIC ESTIMATION PROCEDURES FOR SEA OTTERS. Ecological Applications, 16: 2293–2312, https://doi.org/10.1890/1051-0761(2006)016[2293:IDDARC]2.0.CO;2 Tinker, M. T. , D. P. Costa , J. A. Estes , and N. Wieringa . 2007. Individual dietary specialization and dive behaviour in the California sea otter: using archival time–depth data to detect alternative foraging strategies. Deep Sea Research II 54: 330–342, https://doi.org/10.1016/j.dsr2.2006.11.012
Facebook
TwitterThe GIS shapefile "Census summary of southern sea otter 2017" provides a standardized tool for examining spatial patterns in abundance and demographic trends of the southern sea otter (Enhydra lutris nereis), based on data collected during the spring 2017 range-wide census. The USGS range-wide sea otter census has been undertaken twice a year since 1982, once in May and once in October, using consistent methodology involving both ground-based and aerial-based counts. The spring census is considered more accurate than the fall count, and provides the primary basis for gauging population trends by State and Federal management agencies. This Shape file includes a series of summary statistics derived from the raw census data, including sea otter density (otters per square km of habitat), linear density (otters per km of coastline), relative pup abundance (ratio of pups to independent animals) and 5-year population trend (calculated as exponential rate of change). All statistics are calculated and plotted for small sections of habitat in order to illustrate local variation in these statistics across the entire mainland distribution of sea otters in California (as of 2017). Sea otter habitat is considered to extend offshore from the mean low tide line and out to the 60m isobath: this depth range includes over 99% of sea otter feeding dives, based on dive-depth data from radio tagged sea otters (Tinker et al 2006, 2007). Sea otter distribution in California (the mainland range) is considered to comprise this band of potential habitat stretching along the coast of California, and bounded to the north and south by range limits defined as "the points farthest from the range center at which 5 or more otters are counted within a 10km contiguous stretch of coastline (as measured along the 10m bathymetric contour) during the two most recent spring censuses, or at which these same criteria were met in the previous year". The polygon corresponding to the range definition was then sub-divided into onshore/offshore strips roughly 500 meters in width. The boundaries between these strips correspond to ATOS (As-The-Otter-Swims) points, which are arbitrary locations established approximately every 500 meters along a smoothed 5 fathom bathymetric contour (line) offshore of the State of California. References: Tinker, M. T., Doak, D. F., Estes, J. A., Hatfield, B. B., Staedler, M. M. and Bodkin, J. L. (2006), INCORPORATING DIVERSE DATA AND REALISTIC COMPLEXITY INTO DEMOGRAPHIC ESTIMATION PROCEDURES FOR SEA OTTERS. Ecological Applications, 16: 2293–2312, https://doi.org/10.1890/1051-0761(2006)016[2293:IDDARC]2.0.CO;2 Tinker, M. T. , D. P. Costa , J. A. Estes , and N. Wieringa . 2007. Individual dietary specialization and dive behaviour in the California sea otter: using archival time–depth data to detect alternative foraging strategies. Deep Sea Research II 54: 330–342, https://doi.org/10.1016/j.dsr2.2006.11.012
Facebook
Twitterhttps://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
These are import and export trade statistics by province and city, compiled based on trade data collected through customs clearance procedures and categorized by type. These statistics are compiled by province and city. The HS Code classification system groups similar products together based on their characteristics, using the three 10-digit HS Code numbers. Imports are broadly categorized into three categories: consumer goods, raw materials, and capital goods. Exports are broadly categorized into four categories: food and direct consumer goods, raw materials and fuel, light industrial products, and heavy chemical industrial products. The HS Code is an international commodity classification system that assigns a unique number to every product. The provinces and cities are categorized by 17 cities and provinces, including Seoul, Daegu, Busan, and Gwangju. Import and export trade statistics, by definition, refer to the exchange of goods between a national economy and other countries. They record all goods that are brought into (import) or taken out (export) within a country's economic territory and thereby increase or decrease the country's material resources. Goods simply passing through a country (transit goods) or temporarily imported or exported do not increase or decrease the country's material resources and are therefore not included in trade statistics. The trade statistics area, which serves as the basis for compiling these statistics, is determined by geographical, administrative, and national economic considerations. In most countries, including Korea, the customs territory, which refers to the area to which customs law comprehensively applies, and the trade statistics area are conceptually identical. Therefore, trade statistics encompass the exchange of goods beyond the customs territory. Korea compiles imports based on CIF (cost-on-ship) values, and exports based on FOB (free-on-board) values. Data for the previous month are updated around the 15th of each month to reflect any changes, such as corrections or cancellations, in export and import declarations. The data cycle is monthly. Weight is net weight (kg), and amount is based on import (taxable price in USD) and export (declared price in USD).
Facebook
TwitterThis publication covers annual estimates for waste collected by local authorities in England and the regions. These statistics are based on data submitted by all local authorities in England to WasteDataFlow on the waste they collect and manage.
The methodology and recycling explainer documents give background and context to this statistical notice, accompanying datasets and the waste and recycling measures they present.
There is also a further historical note on the definition of local authority collected waste relating to earlier releases.
The entire raw dataset is available in CSV format and can be found here: https://www.data.gov.uk/dataset/0e0c12d8-24f6-461f-b4bc-f6d6a5bf2de5/wastedataflow-local-authority-waste-management">WasteDataFlow - Local Authority waste management - data.gov.uk
https://webarchive.nationalarchives.gov.uk/ukgwa/20170418015547/https://www.gov.uk/government/statistics/local-authority-collected-waste-management-annual-results">2015 - 2016 This includes the ad hoc release entitled “Provisional 2016/17 local authority data on waste collection and treatment for England (April to June and July to September 2016)”.
Defra statistics: Waste and Recycling
Email mailto:WasteStatistics@defra.gov.uk">WasteStatistics@defra.gov.uk
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).