Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAdequate knowledge and proper practices coupled with knowledge of the burden of disease are necessary for the eradication of Schistosoma infection. This study assessed knowledge, attitude, and practice (KAP) as well as health outcomes related to Schistosoma haematobium infection at Kwahu Afram Plains North District (KAPND).MethodsA cross-sectional survey using a structured questionnaire was carried out among 140 participants from four local communities in KAPND in August 2021. From these participants, 10ml of urine was collected for determination of the presence of S. haematobium and urine routine examination. In addition, 4ml of blood was collected and used for haematological examination. Descriptive statistics and logistic regression analysis using IBM SPSS were used to describe and represent the data collected.ResultsThe study reports a gap in knowledge about schistosomiasis in the study area with the majority indicating that they have not heard of schistosomiasis (60.7%), do not know the mode of transmission (49.3%), and do not know how the disease could be spread (51.5%). The overall prevalence of urinary schistosomiasis was 52.9%. This was associated with age, occupation, perceived mode of Schistosoma transmission, knowledge of Schistosoma prevention, awareness that schistosomiasis can be treated, frequency of visits to water bodies, and water usage patterns. In multivariate analysis, factors that remained significantly associated with S. haematobium infection were age 21–40 (OR = 0.21, 95% CI: 0.06–0.76), 41–60 (OR = 0.01, 95% CI: 0.01–0.52) and ≥ 60 (OR = 0.02, 95% CI: 0.02–0.87), informal employment (OR = 0.01, 95% CI: 0.01–0.69) and awareness of transmission by drinking water from river body (OR = 0.03, 95% CI: 0.03–0.92). In Schistosoma infection, reduced haemoglobin, haematocrit, mean corpuscular volume, mean corpuscular haemoglobin, lymphocytes and eosinophils were observed. White blood cells, neutrophils, and monocytes were significantly elevated in infected states. Urine analysis revealed high pus cells and red blood cells counts among Schistosoma-positive participants.ConclusionSchistosoma infection is endemic among inhabitants in KAPND, and is associated with a gap in knowledge, awareness, and practice possibly due to inadequate education in the area. Poor clinical outcomes associated with Schistosoma infection have been demonstrated in the area. A well-structured public education, nutritional intervention, and mass drug administration will be necessary to eradicate this menace.
Facebook
TwitterTHE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE NATIONAL INSTITUTE OF STATISTICS (INS) - TUNISIA
The survey aims at estimating the demographic and educational characteristics of the population. It also calculates the economic indicators of the population such as the number of active individuals, the additional demand for jobs, the number of employed and their characteristics, the number of jobs created, the characteristics of the unemployed and the unemployment rate. Furthermore, this survey estimates these indicators on the household level and their living conditions.
The results of this survey were compared with the results of the second quarter of the national survey on population and employment 2011. It should also be noted that the National Institute of Statistics -Tunisia uses the unemployment definition and concepts adopted by the International Labour Organization. This definition implies that, the individual did not work during the week preceding the day of the interview, was looking for a job in the month preceding the date of the interview, is available to work within two weeks after the day of the interview.
In 2010, the National Institute of Statistics has adopted a strict ILO definition for unemployment, by conditioning that the person must perform effective approaches to search for a job in the month preceding the day of the interview.
Covering a representative sample at the national and regional level (governorates).
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE NATIONAL INSTITUTE OF STATISTICS - TUNISIA (INS)
The sample is drawn from the frame of the 2004 General Census of Population and Housing.
Face-to-face [f2f]
Three modules were designed for data collection:
Household Questionnaire (Module 1): Includes questions regarding household characteristics, living conditions, individuals and their demographic, educational and economic characteristics. This module also provides information on internal and external migration.
Active Employed Questionnaire (Module 2): Includes questions regarding the characteristics of the employed individuals as occupation, industry and wages for employees.
Active Unemployed Questionnaire (Module 3): Includes questions regarding the characteristics of the unemployed as unemployment duration, the last occupation, activity, and the number of days worked during the last year...etc.
Facebook
TwitterThis dataset contains number of crimes filed under each category of the Indian Penal Code (IPC), number of victims of those crimes, and average crime rate. The data is presented separately by IPC category and sub-category. Data are available at the state/UT level for 2018.
● 7060_source_data.csv: The raw data from the source with original administrative dimensions. This dataset may have already been restructured by scraping PDFs, combining files, or pivoting tables to fit the proper tabular format used by NDAP, but the actual data values remain unchanged. ● NDAP_REPORT_7060.csv: The final standardised data using LGD geographic dimensions as seen on NPAP. ● 7060_metadata.csv: Variable-level metadata, including the following fields: ❖ VariableName: The full variable name as it appears in the data ❖ VariableCode: A unique variable code that is used as a short name for the variable during internal processing and can be used for simplicity if desired ❖ Type_Of_Variable: The classification of the column, whether it is a dimension or a variable (i.e. indicator) ❖ Unit_Of_Measure: ❖ Aggregation_Type: The default aggregation function to be used when aggregating each variable ❖ Weighing_Variable_Name: The weight assigned to each variable that is used by default when aggregating ❖ Weighing_Variable_ID: The weighting variable id corresponding to the weighing variable name ❖ Long_Description: A more descriptive definition of the variable ❖ Scaling_factor: Scaling factor from source ● 7060_KEYS.csv: The key which maps source administrative units to the standardised Local Government Directory (LGD) dimensions. This file also contains pre-calculated weights for every constituent unit mapped from the source dimensions into the LGD. You can interpret each row as describing what fraction of the source unit is mapped to a corresponding LGD unit. This file includes the following fields: ❖ src[Unit]Name: The administrative unit name as it appears in the source data. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit name as it appears in the LGD. Depending on the dataset, that may include State, District, Subdistrict, Block, Village/Town, etc. ❖ [Unit]Name: The standardised administrative unit code corresponding to the unit name in the LDG. ❖ Year: The year in which the data was collected or reported. Depending on the dataset, any other temporal variables may also be present (Quarter, Month, Calendar Day, etc.) ❖ Number_Of_Children: The number of LGD units associated with the mapping described by an individual row. Units from the source that have undergone a split will contain multiple children. ❖ Number_Of_Parents: The number of source units associated with the mapping described by an individual row. Units from the source that have undergone a merge will contain multiple parents. ❖ Weighing_Variables: Households, Population, Male Population, Female Population, Land Area (Total, Rural, and Urban versions of each). For each weighing variable there are the following associated fields: ■ Count: the total count of households, population, or land area mapped from the source unit to the LGD unit for that particular row (NumberOfHouseholds, TotalPopulation, LandArea). ■ Mapping_Error: the percentage error due to missing villages in the base data, meaning what fraction of the weighing variable is dropped because the microdata could not be mapped to the LGD. ■ Weighing_Ratio: the weighing ratio for that constituent match of source unit to LGD unit for each particular row. This is the fraction applied to the source data to achieve the LGD-standardised final data
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.
Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.
We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.
Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.
The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.
To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.
The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.
The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:
Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.
The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.
A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.
The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.
The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Facebook
Twitterhttps://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The PCI and PCIe Image Capture Card market plays a crucial role in various industries, ranging from gaming and streaming to medical imaging and security surveillance. These cards are integral for converting raw video and imaging data from high-definition sources into digital formats that can be processed, recorded,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The Free-living Food Intake Cycle (FreeFIC) dataset was created by the Multimedia Understanding Group towards the investigation of in-the-wild eating behavior. This is achieved by recording the subjects’ meals as a small part part of their everyday life, unscripted, activities. The FreeFIC dataset contains the (3D) acceleration and orientation velocity signals ((6) DoF) from (22) in-the-wild sessions provided by (12) unique subjects. All sessions were recorded using a commercial smartwatch ((6) using the Huawei Watch 2™ and the MobVoi TicWatch™ for the rest) while the participants performed their everyday activities. In addition, FreeFIC also contains the start and end moments of each meal session as reported by the participants.
Description
FreeFIC includes (22) in-the-wild sessions that belong to (12) unique subjects. Participants were instructed to wear the smartwatch to the hand of their preference well ahead before any meal and continue to wear it throughout the day until the battery is depleted. In addition, we followed a self-report labeling model, meaning that the ground truth is provided from the participant by documenting the start and end moments of their meals to the best of their abilities as well as the hand they wear the smartwatch on. The total duration of the (22) recordings sums up to (112.71) hours, with a mean duration of (5.12) hours. Additional data statistics can be obtained by executing the provided python script stats_dataset.py. Furthermore, the accompanying python script viz_dataset.py will visualize the IMU signals and ground truth intervals for each of the recordings. Information on how to execute the Python scripts can be found below.
$ python stats_dataset.py
$ python viz_dataset.py
FreeFIC is also tightly related to Food Intake Cycle (FIC), a dataset we created in order to investigate the in-meal eating behavior. More information about FIC can be found here and here.
Publications
If you plan to use the FreeFIC dataset or any of the resources found in this page, please cite our work:
@article{kyritsis2020data,
title={A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches},
author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios},
journal={IEEE Journal of Biomedical and Health Informatics},
year={2020},
publisher={IEEE}}
@inproceedings{kyritsis2017automated,
title={Detecting Meals In the Wild Using the Inertial Data of a Typical Smartwatch},
author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios},
booktitle={2019 41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},
year={2019},
organization={IEEE}}
Technical details
We provide the FreeFIC dataset as a pickle. The file can be loaded using Python in the following way:
import pickle as pkl import numpy as np
with open('./FreeFIC_FreeFIC-heldout.pkl','rb') as fh: dataset = pkl.load(fh)
The dataset variable in the snipet above is a dictionary with (5) keys. Namely:
'subject_id'
'session_id'
'signals_raw'
'signals_proc'
'meal_gt'
The contents under a specific key can be obtained by:
sub = dataset['subject_id'] # for the subject id ses = dataset['session_id'] # for the session id raw = dataset['signals_raw'] # for the raw IMU signals proc = dataset['signals_proc'] # for the processed IMU signals gt = dataset['meal_gt'] # for the meal ground truth
The sub, ses, raw, proc and gt variables in the snipet above are lists with a length equal to (22). Elements across all lists are aligned; e.g., the (3)rd element of the list under the 'session_id' key corresponds to the (3)rd element of the list under the 'signals_proc' key.
sub: list Each element of the sub list is a scalar (integer) that corresponds to the unique identifier of the subject that can take the following values: ([1, 2, 3, 4, 13, 14, 15, 16, 17, 18, 19, 20]). It should be emphasized that the subjects with ids (15, 16, 17, 18, 19) and (20) belong to the held-out part of the FreeFIC dataset (more information can be found in ( )the publication titled "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al). Moreover, the subject identifier in FreeFIC is in-line with the subject identifier in the FIC dataset (more info here and here); i.e., FIC’s subject with id equal to (2) is the same person as FreeFIC’s subject with id equal to (2).
ses: list Each element of this list is a scalar (integer) that corresponds to the unique identifier of the session that can range between (1) and (5). It should be noted that not all subjects have the same number of sessions.
raw: list Each element of this list is dictionary with the 'acc' and 'gyr' keys. The data under the 'acc' key is a (N_{acc} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw accelerometer measurements in (g) (second, third and forth columns - representing the (x, y ) and (z) axis, respectively). The data under the 'gyr' key is a (N_{gyr} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw gyroscope measurements in ({degrees}/{second})(second, third and forth columns - representing the (x, y ) and (z) axis, respectively). All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). Finally, the length of the raw accelerometer and gyroscope numpy.ndarrays is different ((N_{acc} eq N_{gyr})). This behavior is predictable and is caused by the Android platform.
proc: list Each element of this list is an (M\times7) numpy.ndarray that contains the timestamps, (3D) accelerometer and gyroscope measurements for each meal. Specifically, the first column contains the timestamps in seconds, the second, third and forth columns contain the (x,y) and (z) accelerometer values in (g) and the fifth, sixth and seventh columns contain the (x,y) and (z) gyroscope values in ({degrees}/{second}). Unlike elements in the raw list, processed measurements (in the proc list) have a constant sampling rate of (100) Hz and the accelerometer/gyroscope measurements are aligned with each other. In addition, all sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component).
meal_gt: list Each element of this list is a (K\times2) matrix. Each row represents the meal intervals for the specific in-the-wild session. The first column contains the timestamps of the meal start moments whereas the second one the timestamps of the meal end moments. All timestamps are in seconds. The number of meals (K) varies across recordings (e.g., a recording exist where a participant consumed two meals).
Ethics and funding
Informed consent, including permission for third-party access to anonymised data, was obtained from all subjects prior to their engagement in the study. The work has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 727688 - BigO: Big data against childhood obesity.
Contact
Any inquiries regarding the FreeFIC dataset should be addressed to:
Dr. Konstantinos KYRITSIS
Multimedia Understanding Group (MUG) Department of Electrical & Computer Engineering Aristotle University of Thessaloniki University Campus, Building C, 3rd floor Thessaloniki, Greece, GR54124
Tel: +30 2310 996359, 996365 Fax: +30 2310 996398 E-mail: kokirits [at] mug [dot] ee [dot] auth [dot] gr
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the Raw and Final data for "A NASA GISTEMPv4 Observational Uncertainty Ensemble" as accepted at JGR:Atmospheres (August 2024).
FinalEnsembleOutput/ Contains the official GISTEMPv4 uncertainty ensemble from 1880-2020 with anomalies relative to a 1951-1980 climatology. The ensemble is organized into three subdirectories
FinalEnsembleOutput/FullEnsemble Contains a netCDF file [lon,lat,month] for each of the 200 members on a 2x2 grid.
FinalEnsembleOutput/GriddedSummary Contains netCDF files [lon,lat,month] of statistics summarizing the 200-member ensemble. The statistics contained are the ensemble mean, ensemble sd, quantiles, and sample size (can be less than 200 due to differences in the homogenization in data-sparse regions). The quantiles provided are (0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975).
FinalEnsembleOutput/KeySeries Contains 200-member ensembles of key, large scale time series at monthly resolution. Global, hemispheric, and zonal mean series are provided each for land, ocean, and combined (land and ocean) mean temeperature. Use this data when working with these series rather than calculating yourself from the full ensemble as the method for creating these series incoproates unertainty due to regions without coverage.
Raw/ Contains the raw, source data for the GISTEMP ensemble. This directory is not very large, but contains ~50k files as the GHCN product is distributed as an individual text file for every station in the record.
Intermediate data products from the analysis can be found here: https://doi.org/10.5281/zenodo.13344579
Facebook
TwitterMSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory
--------------------------------------------------------------------------------------
MSZSI is a data extraction tool for Google Earth Engine that aggregates time-series remote sensing information to multiple administrative levels using the FAO GAUL data layers. The code at the bottom of this page (metadata) can be pasted into the Google Earth Engine JavaScript code editor and ran at https://code.earthengine.google.com/.
Please refer to the associated publication:
Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624.
https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624
Input options:
[1] Country of interest
[2] Start and end year
[3] Start and end month
[4] Option to mask data to a specific land-use/land-cover type
[5] Land-use/land-cover type code from CGLS LULC
[6] Image collection for data aggregation
[7] Desired band from the image collection
[8] Statistics type for the zonal aggregations
[9] Statistic to use for annual aggregation
[10] Scaling options
[11] Export folder and label suffix
Output: Two CSVs containing zonal statistics for each of the FAO GAUL administrative level boundaries
Output fields: system:index, 0-ADM0_CODE, 0-ADM0_NAME, 0-ADM1_CODE, 0-ADM1_NAME, 0-ADMN_CODE, 0-ADMN_NAME, 1-AREA_PERCENT_LULC, 1-AREA_SQM_LULC, 1-AREA_SQM_ZONE, 2-X_2001, 2-X_2002, 2-X_2003, ..., 2-X_2020, .geo
PREPROCESSED DATA DOWNLOAD
The datasets available for download contain zonal statistics at 2 administrative levels (FAO GAUL levels 1 and 2). Select countries from Southeast Asia and Sub-Saharan Africa (Cambodia, Indonesia, Lao PDR, Myanmar, Philippines, Thailand, Vietnam, Burundi, Kenya, Malawi, Mozambique, Rwanda, Tanzania, Uganda, Zambia, Zimbabwe) are included in the current version, with plans to extend the dataset to contain global metrics. Each zip file is described below and two example NDVI tables are available for preview.
Key: [source, data, units, temporal range, aggregation, masking, zonal statistic, notes]
Currently available:
MSZSI-V2_V-NDVI-MEAN.tar: [NASA-MODIS, NDVI, index, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-DAY-MEAN.tar: [NASA-MODIS, LST Day, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-NIGHT-MEAN.tar: [NASA-MODIS, LST Night, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_R-PRECIP-SUM.tar: [UCSB-CHG-CHIRPS, Precipitation, mm, 2001–2020, annual sum, agriculture, mean, n/a]
MSZSI-V2_S-BDENS-MEAN.tar: [OpenLandMap, Bulk density, g/cm3, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-ORGC-MEAN.tar: [OpenLandMap, Organic carbon, g/kg, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-PH-MEAN.tar: [OpenLandMap, pH in H2O, pH, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-WATER-MEAN.tar: [OpenLandMap, Soil water, % at 33kPa, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SAND-MEAN.tar: [OpenLandMap, Sand, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SILT-MEAN.tar: [OpenLandMap, Silt, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-CLAY-MEAN.tar: [OpenLandMap, Clay, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_E-ELEV-MEAN.tar: [MERIT, [elevation, slope, flowacc, HAND], [m, degrees, km2, m], static, n/a, agriculture, mean, n/a]
Coming soon
MSZSI-V2_C-STAX-MEAN.tar: [OpenLandMap, Soil taxonomy, category, static, n/a, agriculture, area sum, n/a]
MSZSI-V2_C-LULC-MEAN.tar: [CGLS-LC100-V3, LULC, category, 2015–2019, mode, none, area sum, n/a]
Data sources:
/*/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// MSZSI: Multi-Scale Zonal Statistics Inventory Authors: Brad G. Peter, Department of Geography, University of Alabama Joseph Messina, Department of Geography, University of Alabama Austin Raney, Department of Geography, University of Alabama Rodrigo E. Principe, AgriCircle AG Peilei Fan, Department of Geography, Environment, and Spatial Sciences, Michigan State University Citation: Peter, Brad; Messina, Joseph; Raney, Austin; Principe, Rodrigo; Fan, Peilei, 2021, 'MSZSI: Multi-Scale Zonal Statistics Inventory', https://doi.org/10.7910/DVN/YCUBXS, Harvard Dataverse, V# SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)
Facebook
TwitterThe Household Income, Expenditure and Consumption Survey (HIECS) is of great importance among other household surveys conducted by statistical agencies in various countries around the world. This survey provides a large amount of data to rely on in measuring the living standards of households and individuals, as well as establishing databases that serve in measuring poverty, designing social assistance programs, and providing necessary weights to compile consumer price indices, considered to be an important indicator to assess inflation. The HIECS 2008/2009 is the tenth Household Income, Expenditure and Consumption Survey that was carried out in 2008/2009, among a long series of similar surveys that started back in 1955.
Survey Objectives: 1- To identify expenditure levels and patterns of population as well as socio- economic and demographic differentials. 2- To estimate the quantities and values of commodities and services consumed by households during the survey period to determine the levels of consumption and estimate the current demand which is an important input for national planning. Current and past demand estimates are utilized to predict future demands 3- To measure mean household and per-capita expenditure for various expenditure items along with socio-economic correlates. 4- To define percentage distribution of expenditure for various items used in compiling consumer price indices which is considered important indicator for measuring inflation 5- To define mean household and per-capita income from different sources. 6- To provide data necessary to measure standard of living for households and individuals. Poverty analysis and setting up a basis for social welfare assistance are highly dependant on the results of this survey. 7- To provide essential data to measure elasticity which reflects the percentage change in expenditure for various commodity and service groups against. the percentage change in total expenditure for the purpose of predicting the levels of expenditure and consumption for different commodity and service items in urban and rural areas. 8- To provide data essential for comparing change in expenditure against change in income to measure income elasticity of expenditure. 9- To study the relationships between demographic, geographical and housing characteristics of households and their income and expenditure for commodities and services. 10- To provide data necessary for national accounts especially in compiling inputs and outputs tables. 11- To identify consumers behavior changes among socio-economic groups in urban and rural areas. 12- To identify per capita food consumption and its main components of calories, proteins and fats according to its sources and the levels of expenditure in both urban and rural areas. 13- To identify the value of expenditure for food according to sources, either from household production or not, in addition to household expenditure for non food commodities and services. 14- To identify distribution of households according to the possession of some appliances and equipments such as (cars, satellites, mobiles …) in urban and rural areas.
National
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
The sample of HIECS, 2008-2009 is a two-stage stratified cluster sample, approximately self-weighted, of nearly 48000 households. The main elements of the sampling design are described in the following.
Sample Size It has been deemed important to retain the same sample size of the previous two HIECS rounds. Thus, a sample of about 48000 households has been considered. The justification of maintaining the sample size at this level is to have estimates with levels of precision similar to those of the previous two rounds: therefore trend analysis with the previous two surveys will not be distorted by substantial changes in sampling errors from round to another. In addition, this relatively large national sample implies proportional samples of reasonable sizes for smaller governorates. Nonetheless, over-sampling has been introduced to raise the sample size of small governorates to about 1000 households As a result, reasonably precise estimates could be extracted for those governorates. The over-sampling has resulted in a slight increase in the national sample to 48658 households.
Cluster size An important lesson learned from the previous two HIECS rounds is that the cluster size applied in both surveys is found to be too large to yield an accepted design effect estimates. The cluster size was 40 households in the 2004-2005 round, descending from 80 households in the 1999-2000 round. The estimates of the design effect (deft) for most survey measures of the latest round were extraordinary large. As a result, it has been decided to decrease the cluster size to only 19 households (20 households in urban governorates to account for anticipated non-response in those governorates: in view of past experience non-response is almost nil in rural governorates).
Computer Assisted Telephone Interview [cati]
Three different questionnaires have been designed as following: 1- Expenditure and consumption questionnaire. 2- Diary questionnaire for expenditure and consumption. 3- Income questionnaire.
Office Editing: It is one of the main stages of the survey. It started as soon as the questionnaires were received from the field and accomplished by selected work groups. It includes: a- Editing of coverage and completeness b- Editing of consistency c- Arithmetic editing of quantities and values.
Data Coding: Specialized staff has coded the data of industry, occupation and geographical identification.
Data Processing and preparing final results It included machine data entry, data validation and tabulation and preparing final survey volumes
Harmonized Data: - The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets. - The harmonization process starts with cleaning all raw data files received from the Statistical Office. - Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process is run on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.
For the total sample, the response rate was 96.3% (93.95% in urban areas and 98.4% in rural areas). Response rates on the governorate level at each sampling stage are presented in the methodology document attached to the external resources in both Arabic and English.
The sampling error of major survey estimates has been derived using the Ultimate Cluster Method as applied in the CENVAR Module of the Integrated Microcomputer Processing System (IMPS) Package. In addition to the estimate of sampling error, the output includes estimates of coefficient of variation, design effect (deff) and 95% confidence intervals.
Quality Control Procedures:
The precision of survey results depends to a large extent on how the survey has been prepared for. As such, it was deemed crucial to exert much effort and to take necessary actions towards rigorous preparation for the present survey. The preparatory activities, extended over 3 months, included forming Technical Committee. The Committee has set up the general framework of survey implementation such as:
1- Applying the recent international recommendations of different concepts and definitions of income and expenditure considering maintaining the consistency with the previous surveys in order to compare and study the changes in pertinent indicators.
2- Evaluating the quality of data in all different Implementation stages to avoid or minimize errors to the lowest extent possible through: - Implementing field editing after finishing data collection for households in governorates to avoid any errors in suitable time. - Setting up a program for the Survey Technical Committee Members and survey staff for visiting field work in all governorates (each 15 days) to solve any problem in the proper time. - Re-interviewing a sample of households by Quality Control Department and examining the differences with the original responses. - For the purpose of quality assurance, tables were generated for each survey round where internal consistency checks were performed to study the plausibility of mean household expenditure on major expenditure commodity groups and its variability over major geographic regions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the dataset contains statistics for significant wave height (hs), mean wave period (tm), peak wave period (tp) and mean wave direction (θm) for hindcast (1979-2005) and a multi-model ensemble of 17 euro-cordex gcm-rcms projections for the following periods: baseline (1979-2005), mid-century (2034-2060) for rcp 8.5, and end-of-century (2074-2100) for rcp 8.5.the following statistics are included providing seasonal and monthly means:significant wave height:for hindcast, raw gcm-rcms and bias-adjusted gcm-rcms average significant wave height 10, 50, 90, 95 and 99th maximum significant wave height average hs when hs>90 and 95th percentiles of the hindcast number of sea states when hs>90 and 95th percentiles of the hindcast percentage of sea states when hs>90 and 95th percentiles of the hindcast average hs when hs>1.25, 2.5 and 4 m number of sea states when hs>1.25, 2.5 and 4 m percentage of sea states when hs>1.25, 2.5 and 4 m number of days with at least two consecutive days when daily-max hs>90th and 95th percentiles of the hindcast percentage of days with at least two consecutive days when daily-max hs>90th and 95th percentiles of the hindcastmean wave period: for hindcast and raw gcm-rcms average mean wave period 10, 50, 90, 95 and 99th percentiles maximum mean wave periodpeak wave period:for hindcast and raw gcm-rcms average peak wave period 10, 50, 90, 95 and 99th percentiles maximum peak wave periodmean wave direction:for hindcast and raw gcm-rcms mean circular mean circular standard deviationtime span: hindcast/baseline: 1979-01-01 – 2005-12-31 mid-century: 2034-01-01 – 2060-12-31 end-of-century: 2074-01-01 – 2100-12-31
Facebook
TwitterThese Economic Estimates are National Statistics providing an estimate of the contribution of DCMS Sectors to the UK economy, measured by the number of businesses.
We have experimented with using a different, more timely data source to calculate this year’s Business Demographics statistics. As a result, they are not comparable with earlier DCMS Sector Business Demographics publications. More information is provided in these published documents and in the “Call for Feedback” section below.
These statistics cover the contributions of the following DCMS sectors to the UK economy;
Users should note that there is overlap between DCMS Sector definitions and that the Telecoms sector sits wholly within the Digital sector. Estimates are not available for the Civil Society sector, because they are not identifiable in the data source used for this release.
The release also includes estimates for the Audio Visual sector, which is not a DCMS Sector but is “adjacent” to it and includes some industries also common to DCMS Sectors.
A definition for each sector is available in the published data tables.
These statistics were first published on 8 December 2022
In this publication we have experimented with using a snapshot of the Inter-Departmental Business Register (IDBR) to generate estimates of DCMS Business Demographics, rather than the Annual Business Survey (ABS) as in previous releases. This has the advantage of being more timely, and commits to most tables included in previous Business Demographics publications. We have used the March 2019, March 2020, March 2021 and March 2022 snapshots from the ONS https://www.ons.gov.uk/businessindustryandtrade/business/activitysizeandlocation/datasets/ukbusinessactivitysizeandlocation">UK business: activity, size and location release rather than raw data from the IDBR.
We are looking for feedback on this approach. We particularly welcome views on:
Please contact evidence@dcms.gov.uk before Thursday 9th February 2023 with any feedback.
Hard copy feedback can be sent to:
DCMS Economic Estimates Team
Department for Digital, Culture, Media & Sport
4th Floor - area 4/34
100 Parliament Street
London
SW1A 2BQ
This release is published in accordance with the Code of Practice for Statistics (2018) produced by the UK Statistics Authority (UKSA). The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.
The accompanying pre-release access document lists ministers and officials who have received privileged early access to this release. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.
Responsible analyst: Eri Hutchinson
For any queries or feedback, please contact evidence@dcms.gov.uk.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rows: 50,570 **Columns:** 13
This dataset contains raw time-series telemetry from a small hydroponics setup. It includes water chemistry and environment sensors (pH, TDS, temperature, humidity, water level) alongside actuator states (pH reducer pump, water add pump, nutrient adder, humidifier, exhaust fan). The goal is to support research on forecasting, control, anomaly detection, and resource-efficient sensing in indoor agriculture.
Data were collected from an Arduino/ESP-class IoT node connected to pH, TDS, water-level, and DHT sensors, with relay-controlled actuators for maintaining optimal growth conditions. Timestamps are device-recorded at variable intervals during daily operation.
IoTData --Raw--.csv — one row per timestamped reading/action.id (int) — Row identifier. timestamp (ISO 8601 string) — Local device timestamp of the record. pH (float) — Acidity/alkalinity of the nutrient solution (typical 5.5–6.8 for leafy greens). TDS (float, ppm) — Total Dissolved Solids of the solution (proxy for nutrient concentration). water_level (int, 0–3) — Discrete level indicator (0=empty/low … 3=high). DHT_temp (°C) — Ambient/air temperature measured near the reservoir. DHT_humidity (% RH) — Ambient relative humidity. water_temp (°C) — Water temperature of the reservoir. pH_reducer (ON/OFF) — Acid dosing pump state. add_water (ON/OFF) — Top-up pump state. nutrients_adder (ON/OFF) — Nutrient dosing pump state. humidifier (ON/OFF) — Ultrasonic/cool-mist actuator state. ex_fan (ON/OFF) — Exhaust fan state.Heads-up: the raw export contains some anomalies; see “Known Issues.”
pH: mean ≈ 6.00, min=0.27, max=11.57 TDS (ppm): mean ≈ 1154.1, min=−283.91, max=2278.35 DHT_temp (°C): mean ≈ 24.32, range 12.3–70.0 DHT_humidity (%): mean ≈ 71.71, range 25.0–3312.6 Actuator sparsity (ON counts):
- pH_reducer: 652
- add_water: 2,260
- nutrients_adder: 3,057
- humidifier: 2,458
- ex_fan: 52
TDS, extremely high DHT_humidity). Please clip/outlier-filter as appropriate. OFF, which matters for classification or event prediction tasks. TDS in ppm; temperatures in °C; humidity in %RH; water level is discrete (0–3).
Facebook
TwitterThe GIS layer "Census_sum_15" provides a standardized tool for examining spatial patterns in abundance and demographic trends of the southern sea otter (Enhydra lutris nereis), based on data collected during the spring 2015 range-wide census. The USGS range-wide sea otter census has been undertaken twice a year since 1982, once in May and once in October, using consistent methodology involving both ground-based and aerial-based counts. The spring census is considered more accurate than the fall count, and provides the primary basis for gauging population trends by State and Federal management agencies. This Shape file includes a series of summary statistics derived from the raw census data, including sea otter density (otters per square km of habitat), linear density (otters per km of coastline), relative pup abundance (ratio of pups to independent animals) and 5-year population trend (calculated as exponential rate of change). All statistics are calculated and plotted for small sections of habitat in order to illustrate local variation in these statistics across the entire mainland distribution of sea otters in California (as of 2015). Sea otter habitat is considered to extend offshore from the mean low tide line and out to the 60m isobath: this depth range includes over 99% of sea otter feeding dives, based on dive-depth data from radio tagged sea otters (Tinker et al 2006, 2007). Sea otter distribution in California (the mainland range) is considered to comprise this band of potential habitat stretching along the coast of California, and bounded to the north and south by range limits defined as "the points farthest from the range center at which 5 or more otters are counted within a 10km contiguous stretch of coastline (as measured along the 10m bathymetric contour) during the two most recent spring censuses, or at which these same criteria were met in the previous year". The polygon corresponding to the range definition was then sub-divided into onshore/offshore strips roughly 500 meters in width. The boundaries between these strips correspond to ATOS (As-The-Otter-Swims) points, which are arbitrary locations established approximately every 500 meters along a smoothed 5 fathom bathymetric contour (line) offshore of the State of California.
Facebook
TwitterThe GIS shapefile "Census summary of southern sea otter 2017" provides a standardized tool for examining spatial patterns in abundance and demographic trends of the southern sea otter (Enhydra lutris nereis), based on data collected during the spring 2017 range-wide census. The USGS range-wide sea otter census has been undertaken twice a year since 1982, once in May and once in October, using consistent methodology involving both ground-based and aerial-based counts. The spring census is considered more accurate than the fall count, and provides the primary basis for gauging population trends by State and Federal management agencies. This Shape file includes a series of summary statistics derived from the raw census data, including sea otter density (otters per square km of habitat), linear density (otters per km of coastline), relative pup abundance (ratio of pups to independent animals) and 5-year population trend (calculated as exponential rate of change). All statistics are calculated and plotted for small sections of habitat in order to illustrate local variation in these statistics across the entire mainland distribution of sea otters in California (as of 2017). Sea otter habitat is considered to extend offshore from the mean low tide line and out to the 60m isobath: this depth range includes over 99% of sea otter feeding dives, based on dive-depth data from radio tagged sea otters (Tinker et al 2006, 2007). Sea otter distribution in California (the mainland range) is considered to comprise this band of potential habitat stretching along the coast of California, and bounded to the north and south by range limits defined as "the points farthest from the range center at which 5 or more otters are counted within a 10km contiguous stretch of coastline (as measured along the 10m bathymetric contour) during the two most recent spring censuses, or at which these same criteria were met in the previous year". The polygon corresponding to the range definition was then sub-divided into onshore/offshore strips roughly 500 meters in width. The boundaries between these strips correspond to ATOS (As-The-Otter-Swims) points, which are arbitrary locations established approximately every 500 meters along a smoothed 5 fathom bathymetric contour (line) offshore of the State of California. References: Tinker, M. T., Doak, D. F., Estes, J. A., Hatfield, B. B., Staedler, M. M. and Bodkin, J. L. (2006), INCORPORATING DIVERSE DATA AND REALISTIC COMPLEXITY INTO DEMOGRAPHIC ESTIMATION PROCEDURES FOR SEA OTTERS. Ecological Applications, 16: 2293–2312, https://doi.org/10.1890/1051-0761(2006)016[2293:IDDARC]2.0.CO;2 Tinker, M. T. , D. P. Costa , J. A. Estes , and N. Wieringa . 2007. Individual dietary specialization and dive behaviour in the California sea otter: using archival time–depth data to detect alternative foraging strategies. Deep Sea Research II 54: 330–342, https://doi.org/10.1016/j.dsr2.2006.11.012
Facebook
TwitterThe GIS shapefile Census_sum_2019 provides a standardized tool for examining spatial patterns in abundance and demographic trends of the southern sea otter (Enhydra lutris nereis), based on data collected during the spring 2019 range-wide census. The USGS spring range-wide sea otter census has been undertaken each year since 1982, using consistent methodology involving both ground-based and aerial-based counts. The spring census provides the primary basis for gauging population trends by State and Federal management agencies. This shapefile includes a series of summary statistics derived from the raw census data, including sea otter density (otters per square kilometer of habitat), linear density (otters per kilometer of coastline), relative pup abundance (ratio of pups to independent animals) and 5-year population trend (calculated as exponential rate of change). All statistics are calculated and plotted for small sections of habitat in order to illustrate local variation in these statistics across the entire mainland distribution of sea otters in California (as of 2019). Sea otter habitat is considered to extend offshore from the mean low tide line and out to the 60 meter isobath: this depth range includes over 99 percent of sea otter feeding dives, based on dive-depth data from radio tagged sea otters (Tinker et al 2006, 2007). Sea otter distribution in California (the mainland range) is considered to comprise this band of potential habitat stretching along the coast of California, and bounded to the north and south by range limits defined by combining independent otters within a moving window of 10-kilometer stretches of coastline (as measured along the 10-meter bathymetric contour; 20 contiguous ATOS intervals each) and taking the northern and southern ATOS values, respectively, of the northernmost and southernmost stretches in which at least five otters were counted for at least 2 consecutive spring surveys during the last 3 years. The polygon corresponding to the range definition was then sub-divided into onshore/offshore strips roughly 500 meters in width. The boundaries between these strips correspond to ATOS (As-The-Otter-Swims) points, which are arbitrary locations established approximately every 500 meters along a smoothed 5 fathom bathymetric contour (line) offshore of the State of California. References: Tinker, M. T., Doak, D. F., Estes, J. A., Hatfield, B. B., Staedler, M. M. and Bodkin, J. L. (2006), INCORPORATING DIVERSE DATA AND REALISTIC COMPLEXITY INTO DEMOGRAPHIC ESTIMATION PROCEDURES FOR SEA OTTERS. Ecological Applications, 16: 2293–2312, https://doi.org/10.1890/1051-0761(2006)016[2293:IDDARC]2.0.CO;2 Tinker, M. T. , D. P. Costa , J. A. Estes , and N. Wieringa . 2007. Individual dietary specialization and dive behaviour in the California sea otter: using archival time–depth data to detect alternative foraging strategies. Deep Sea Research II 54: 330–342, https://doi.org/10.1016/j.dsr2.2006.11.012
Facebook
Twitter
Facebook
TwitterThis publication covers annual estimates for waste collected by local authorities in England and the regions. These statistics are based on data submitted by all local authorities in England to WasteDataFlow on the waste they collect and manage.
The methodology and recycling explainer documents give background and context to this statistical notice, accompanying datasets and the waste and recycling measures they present.
There is also a further historical note on the definition of local authority collected waste relating to earlier releases.
The entire raw dataset is available in CSV format and can be found here: https://www.data.gov.uk/dataset/0e0c12d8-24f6-461f-b4bc-f6d6a5bf2de5/wastedataflow-local-authority-waste-management">WasteDataFlow - Local Authority waste management - data.gov.uk
https://webarchive.nationalarchives.gov.uk/ukgwa/20170418015547/https://www.gov.uk/government/statistics/local-authority-collected-waste-management-annual-results">2015 - 2016 This includes the ad hoc release entitled “Provisional 2016/17 local authority data on waste collection and treatment for England (April to June and July to September 2016)”.
Defra statistics: Waste and Recycling
Email mailto:WasteStatistics@defra.gov.uk">WasteStatistics@defra.gov.uk
Facebook
TwitterTHE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 50% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)
The Household Income, Expenditure and Consumption Survey (HIECS) is of great importance among other household surveys conducted by statistical agencies in various countries around the world. This survey provides a large amount of data to rely on in measuring the living standards of households and individuals, as well as establishing databases that serve in measuring poverty, designing social assistance programs, and providing necessary weights to compile consumer price indices, considered to be an important indicator to assess inflation.
The First Survey that covered all the country governorates was carried out in 1958/1959 followed by a long series of similar surveys . The current survey, HIECS 2012/2013, is the eleventh in this long series.
Starting 2008/2009, Household Income, Expenditure and Consumption Surveys were conducted each two years instead of five years. this would enable better tracking of the rapid changes in the level of the living standards of the Egyptian households.
CAPMAS started in 2010/2011 to follow a panel sample of around 40% of the total household sample size. The current survey is the second one to follow a panel sample. This procedure will provide the necessary data to extract accurate indicators on the status of the society. The CAPMAS also is pleased to disseminate the results of this survey to policy makers, researchers and scholarly to help in policy making and conducting development related researches and studies
The survey main objectives are:
To identify expenditure levels and patterns of population as well as socio- economic and demographic differentials.
To measure average household and per-capita expenditure for various expenditure items along with socio-economic correlates.
To Measure the change in living standards and expenditure patterns and behavior for the individuals and households in the panel sample, previously surveyed in 2008/2009, for the first time during 12 months representing the survey period.
To define percentage distribution of expenditure for various items used in compiling consumer price indices which is considered important indicator for measuring inflation.
To estimate the quantities, values of commodities and services consumed by households during the survey period to determine the levels of consumption and estimate the current demand which is important to predict future demands.
To define average household and per-capita income from different sources.
To provide data necessary to measure standard of living for households and individuals. Poverty analysis and setting up a basis for social welfare assistance are highly dependent on the results of this survey.
To provide essential data to measure elasticity which reflects the percentage change in expenditure for various commodity and service groups against the percentage change in total expenditure for the purpose of predicting the levels of expenditure and consumption for different commodity and service items in urban and rural areas.
To provide data essential for comparing change in expenditure against change in income to measure income elasticity of expenditure.
To study the relationships between demographic, geographical, housing characteristics of households and their income.
To provide data necessary for national accounts especially in compiling inputs and outputs tables.
To identify consumers behavior changes among socio-economic groups in urban and rural areas.
To identify per capita food consumption and its main components of calories, proteins and fats according to its nutrition components and the levels of expenditure in both urban and rural areas.
To identify the value of expenditure for food according to its sources, either from household production or not, in addition to household expenditure for non-food commodities and services.
To identify distribution of households according to the possession of some appliances and equipments such as (cars, satellites, mobiles ,…etc) in urban and rural areas that enables measuring household wealth index.
To identify the percentage distribution of income earners according to some background variables such as housing conditions, size of household and characteristics of head of household.
To provide a time series of the most important data related to dominant standard of living from economic and social perspective. This will enable conducting comparisons based on the results of these time series. In addition to, the possibility of performing geographical comparisons.
Compared to previous surveys, the current survey experienced certain peculiarities, among which :
1- The total sample of the current survey (24.9 thousand households) is divided into two sections:
a- A new sample of 16.1 thousand households. This sample was used to study the geographic differences between urban governorates, urban and rural areas, and frontier governorates as well as other discrepancies related to households characteristics and household size, head of the household's education status, ....... etc.
b- A panel sample of 2008/2009 survey data of around 8.8 thousand households was selected to accurately study the changes that may have occurred in the households' living standards over the period between the two surveys and over time in the future since CAPMAS will continue to collect panel data for HIECS in the coming years.
2- Some additional questions that showed to be important based on previous surveys results, were added to the survey questionnaire, such as:
a- The extent of health services provided to monitor the level of services available in the Egyptian society. By collecting information on the in-kind transfers, the household received during the year; in order to monitor the assistance the household received from different sources government, association,..etc.
b- Identifying the main outlet of fabrics, clothes and footwear to determine the level of living standards of the household.
3- Quality control procedures especially for fieldwork are increased, to ensure data accuracy and avoid any errors in suitable time, as well as taking all the necessary measures to guarantee that mistakes are not repeated, with the application of the principle of reward and punishment.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing household surveys in several Arab countries.
Covering a sample of urban and rural areas in all the governorates.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 50% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)
The sample of HIECS 2012/2013 is a self-weighted two-stage stratified cluster sample, of around 24.9 households. The main elements of the sampling design are described in the following.
1- Sample Size The sample has been proportionally distributed on the governorate level between urban and rural areas, in order to make the sample representative even for small governorates. Thus, a sample of about 24863 households has been considered, and was distributed between urban and rural with the percentages of 45.4 % and 54.6, respectively. This sample is divided into two parts: a- A new sample of 16094 households selected from main enumeration areas. b- A panel sample of 8769 households (selected from HIECS 2010/2011 and the preceding survey in 2008/2009).
2- Cluster size The cluster size in the previous survey has been decreased compared to older surveys since large cluster sizes previously used were found to be too large to yield accepted design effect estimates (DEFT). As a result, it has been decided to use a cluster size of only 8 households (In HIECS 2011/2012 a cluster size of 16 households was used). While the cluster size for the panel sample was 4 households.
3- Core Sample The core sample is the master sample of any household sample required to be pulled for the purpose of studying the properties of individuals and families. It is a large sample and distributed on urban and rural areas of all governorates. It is a representative sample for the individual characteristics of the Egyptian society. This sample was implemented in January 2012 and its size reached more than 1 million household (1004800 household) selected from 5024 enumeration areas distributed on all governorates (urban/rural) proportionally with the sample size (the enumeration area size is around 200 households). The core sample is the sampling frame from which the samples for the surveys conducted by CAPMAS are pulled, such as the Labor Force Surveys, Income, Expenditure And Consumption Survey, Household Urban Migration Survey, ...etc, in addition to other samples that may be required for outsources.
New Households Sample 1000 sample areas were selected across all governorates (urban/rural) using a proportional technique with the sample size. The number
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset seeks to provide insights into what has changed due to policies aimed at combating COVID-19 and evaluate the changes in community activities and its relation to reduced confirmed cases of COVID-19. The reports chart movement trends, compared to an expected baseline, over time (from 2020/02/15 to 2020/02/05) by geography (across 133 countries), as well as some other stats about the country that might help explain the evolution of the disease.
Bing COVID-19 data. Available at: https://github.com/microsoft/Bing-COVID-19-Data COVID-19 Community Mobility Report. Available at: https://www.google.com/covid19/mobility/ COVID-19: Government Response Stringency Index. Available at: https://ourworldindata.org/grapher/covid-stringency-index Coronavirus (COVID-19) Testing. Available at: https://github.com/owid/covid-19-data/blob/master/public/data/testing/covid-testing-all-observations.csv Coronavirus (COVID-19) Vaccination. Available at: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv List of countries and dependencies by population. Available at: https://www.kaggle.com/tanuprabhu/population-by-country-2020 List of countries and dependencies by population density. Available at: https://www.kaggle.com/tanuprabhu/population-by-country-2020 List of countries by Human Development Index. Available at: http://hdr.undp.org/en/data Measuring Overall Health System Performance. Available at: https://www.who.int/healthinfo/paper30.pdf?ua=1 List of countries by GDP (PPP) per capita. Available at: https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD List of countries by age structure (65+). Available at: https://data.worldbank.org/indicator/SP.POP.65UP.TO.ZS
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).