26 datasets found

Data from: Macaques preferentially attend to intermediately surprising...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Apr 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shengyi Wu; Tommy Blanchard; Emily Meschke; Richard Aslin; Ben Hayden; Celeste Kidd (2022). Macaques preferentially attend to intermediately surprising information [Dataset]. http://doi.org/10.6078/D15Q7Q
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6078/D15Q7Q
Dataset updated
Apr 26, 2022
Dataset provided by
University of Minnesota
University of California, Berkeley
Yale University
Klaviyo
Authors
Shengyi Wu; Tommy Blanchard; Emily Meschke; Richard Aslin; Ben Hayden; Celeste Kidd
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Normative learning theories dictate that we should preferentially attend to informative sources, but only up to the point that our limited learning systems can process their content. Humans, including infants, show this predicted strategic deployment of attention. Here we demonstrate that rhesus monkeys, much like humans, attend to events of moderate surprisingness over both more and less surprising events. They do this in the absence of any specific goal or contingent reward, indicating that the behavioral pattern is spontaneous. We suggest this U-shaped attentional preference represents an evolutionarily preserved strategy for guiding intelligent organisms toward material that is maximally useful for learning. Methods How the data were collected: In this project, we collected gaze data of 5 macaques when they watched sequential visual displays designed to elicit probabilistic expectations using the Eyelink Toolbox and were sampled at 1000 Hz by an infrared eye-monitoring camera system. Dataset:

"csv-combined.csv" is an aggregated dataset that includes one pop-up event per row for all original datasets for each trial. Here are descriptions of each column in the dataset:

subj: subject_ID = {"B":104, "C":102,"H":101,"J":103,"K":203} trialtime: start time of current trial in second trial: current trial number (each trial featured one of 80 possible visual-event sequences)(in order) seq current: sequence number (one of 80 sequences) seq_item: current item number in a seq (in order) active_item: pop-up item (active box) pre_active: prior pop-up item (actve box) {-1: "the first active object in the sequence/ no active object before the currently active object in the sequence"} next_active: next pop-up item (active box) {-1: "the last active object in the sequence/ no active object after the currently active object in the sequence"} firstappear: {0: "not first", 1: "first appear in the seq"} looks_blank: csv: total amount of time look at blank space for current event (ms); csv_timestamp: {1: "look blank at timestamp", 0: "not look blank at timestamp"} looks_offscreen: csv: total amount of time look offscreen for current event (ms); csv_timestamp: {1: "look offscreen at timestamp", 0: "not look offscreen at timestamp"} time till target: time spent to first start looking at the target object (ms) {-1: "never look at the target"} looks target: csv: time spent to look at the target object (ms);csv_timestamp: look at the target or not at current timestamp (1 or 0) look1,2,3: time spent look at each object (ms) location 123X, 123Y: location of each box (location of the three boxes for a given sequence were chosen randomly, but remained static throughout the sequence) item123id: pop-up item ID (remained static throughout a sequence) event time: total time spent for the whole event (pop-up and go back) (ms) eyeposX,Y: eye position at current timestamp

"csv-surprisal-prob.csv" is an output file from Monkilock_Data_Processing.ipynb. Surprisal values for each event were calculated and added to the "csv-combined.csv". Here are descriptions of each additional column:

rt: time till target {-1: "never look at the target"}. In data analysis, we included data that have rt > 0. already_there: {NA: "never look at the target object"}. In data analysis, we included events that are not the first event in a sequence, are not repeats of the previous event, and already_there is not NA. looks_away: {TRUE: "the subject was looking away from the currently active object at this time point", FALSE: "the subject was not looking away from the currently active object at this time point"} prob: the probability of the occurrence of object surprisal: unigram surprisal value bisurprisal: transitional surprisal value std_surprisal: standardized unigram surprisal value std_bisurprisal: standardized transitional surprisal value binned_surprisal_means: the means of unigram surprisal values binned to three groups of evenly spaced intervals according to surprisal values. binned_bisurprisal_means: the means of transitional surprisal values binned to three groups of evenly spaced intervals according to surprisal values.

"csv-surprisal-prob_updated.csv" is a ready-for-analysis dataset generated by Analysis_Code_final.Rmd after standardizing controlled variables, changing data types for categorical variables for analysts, etc. "AllSeq.csv" includes event information of all 80 sequences

Empty Values in Datasets:

There is no missing value in the original dataset "csv-combined.csv". Missing values (marked as NA in datasets) happen in columns "prev_active", "next_active", "already_there", "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" in "csv-surprisal-prob.csv" and "csv-surprisal-prob_updated.csv". NAs in columns "prev_active" and "next_active" mean that the first or the last active object in the sequence/no active object before or after the currently active object in the sequence. When we analyzed the variable "already_there", we eliminated data that their "prev_active" variable is NA. NAs in column "already there" mean that the subject never looks at the target object in the current event. When we analyzed the variable "already there", we eliminated data that their "already_there" variable is NA. Missing values happen in columns "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" when it is the first event in the sequence and the transitional probability of the event cannot be computed because there's no event happening before in this sequence. When we fitted models for transitional statistics, we eliminated data that their "bisurprisal", "std_bisurprisal", and "sq_std_bisurprisal" are NAs.

Codes:

In "Monkilock_Data_Processing.ipynb", we processed raw fixation data of 5 macaques and explored the relationship between their fixation patterns and the "surprisal" of events in each sequence. We computed the following variables which are necessary for further analysis, modeling, and visualizations in this notebook (see above for details): active_item, pre_active, next_active, firstappear ,looks_blank, looks_offscreen, time till target, looks target, look1,2,3, prob, surprisal, bisurprisal, std_surprisal, std_bisurprisal, binned_surprisal_means, binned_bisurprisal_means. "Analysis_Code_final.Rmd" is the main scripts that we further processed the data, built models, and created visualizations for data. We evaluated the statistical significance of variables using mixed effect linear and logistic regressions with random intercepts. The raw regression models include standardized linear and quadratic surprisal terms as predictors. The controlled regression models include covariate factors, such as whether an object is a repeat, the distance between the current and previous pop up object, trial number. A generalized additive model (GAM) was used to visualize the relationship between the surprisal estimate from the computational model and the behavioral data. "helper-lib.R" includes helper functions used in Analysis_Code_final.Rmd
Living Standards Survey III 1991-1992 - World Bank SHIP Harmonized Dataset -...
datacatalog.ihsn.org
dev.ihsn.org
+2more
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghana Statistical Service (GSS) (2019). Living Standards Survey III 1991-1992 - World Bank SHIP Harmonized Dataset - Ghana [Dataset]. https://datacatalog.ihsn.org/catalog/2358
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Ghana Statistical Services
Authors
Ghana Statistical Service (GSS)
Time period covered
1991 - 1992
Area covered
Ghana
Description
Abstract

Survey based Harmonized Indicators (SHIP) files are harmonized data files from household surveys that are conducted by countries in Africa. To ensure the quality and transparency of the data, it is critical to document the procedures of compiling consumption aggregation and other indicators so that the results can be duplicated with ease. This process enables consistency and continuity that make temporal and cross-country comparisons consistent and more reliable.

Four harmonized data files are prepared for each survey to generate a set of harmonized variables that have the same variable names. Invariably, in each survey, questions are asked in a slightly different way, which poses challenges on consistent definition of harmonized variables. The harmonized household survey data present the best available variables with harmonized definitions, but not identical variables. The four harmonized data files are

a) Individual level file (Labor force indicators in a separate file): This file has information on basic characteristics of individuals such as age and sex, literacy, education, health, anthropometry and child survival. b) Labor force file: This file has information on labor force including employment/unemployment, earnings, sectors of employment, etc. c) Household level file: This file has information on household expenditure, household head characteristics (age and sex, level of education, employment), housing amenities, assets, and access to infrastructure and services. d) Household Expenditure file: This file has consumption/expenditure aggregates by consumption groups according to Purpose (COICOP) of Household Consumption of the UN.

Geographic coverage

National

Analysis unit

Individual level for datasets with suffix _I and _L

Household level for datasets with suffix _H and _E

Universe

The survey covered all de jure household members (usual residents).

Kind of data

Sample survey data [ssd]

Sampling procedure

A multi-stage sampling technique was used in selecting the GLSS sample. Initially, 4565 households were selected for GLSS3, spread around the country in 407 small clusters; in general, 15 households were taken in an urban cluster and 10 households in a rural cluster. The actual achieved sample was 4552 households. Because of the sample design used, and the very high response rate achieved, the sample can be considered as being selfweighting, though in the case of expenditure data weighting of the expenditure values is required.

Mode of data collection

Face-to-face [f2f]
o
Essential Climate Variables: Sum of monthly precipitation (Copernicus...
data.opendatascience.eu
Updated Jun 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Essential Climate Variables: Sum of monthly precipitation (Copernicus Climate Data Store) [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?resolution=0.25%20degrees
Explore at:
Dataset updated
Jun 10, 2021
Description
Overview: The Essential Climate Variables for assessment of climate variability from 1979 to present dataset contains a selection of climatologies, monthly anomalies and monthly mean fields of Essential Climate Variables (ECVs) suitable for monitoring and assessment of climate variability and change. Selection criteria are based on accuracy and temporal consistency on monthly to decadal time scales. The ECV data products in this set have been estimated from climate reanalyses ERA-Interim and ERA5, and, depending on the source, may have been adjusted to account for biases and other known deficiencies. Data sources and adjustment methods used are described in the Product User Guide, as are various particulars such as the baseline periods used to calculate monthly climatologies and the corresponding anomalies. Sum of monthly precipitation: This variable is the accumulated liquid and frozen water, including rain and snow, that falls to the Earth's surface. It is the sum of large-scale precipitation (that precipitation which is generated by large-scale weather patterns, such as troughs and cold fronts) and convective precipitation (generated by convection which occurs when air at lower levels in the atmosphere is warmer and less dense than the air above, so it rises). Precipitation variables do not include fog, dew or the precipitation that evaporates in the atmosphere before it lands at the surface of the Earth. Spatial resolution: 0:15:00 (0.25°) Temporal resolution: monthly Temporal extent: 1979 - present Data unit: mm * 10 Data type: UInt32 CRS as EPSG: EPSG:4326 Processing time delay: one month
Quarterly Labour Force Survey Household Dataset, January - March, 2021
beta.ukdataservice.ac.uk
datacatalogue.cessda.eu
Updated 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, January - March, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8809-4
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8809-4
Dataset updated
2023
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Authors
Office For National Statistics
Description
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

Household datasets
Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

Change to coding of missing values for household series
From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance page before commencing analysis.

Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

End User Licence and Secure Access QLFS Household datasets
Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

Changes to variables in QLFS Household EUL datasets
In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

Review of imputation methods for LFS Household data - changes to missing values
A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

Latest edition information
For the fourth edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
f
Model evaluation for positive COVID-19 cases.
plos.figshare.com
xls
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teresa-Thuong Le; Xiyue Liao (2024). Model evaluation for positive COVID-19 cases. [Dataset]. http://doi.org/10.1371/journal.pone.0302324.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302324.t002
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Teresa-Thuong Le; Xiyue Liao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19’s occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.
ERA5 hourly data on single levels from 1940 to present
cds.climate.copernicus.eu
arcticdata.io
grib
Updated Mar 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.adbb2d47
Dataset updated
Mar 26, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
Time period covered
Jan 1, 1959 - Mar 20, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
i
Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...
datacatalog.ihsn.org
catalog.ihsn.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://datacatalog.ihsn.org/catalog/7662
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
The Hashemite Kingdom of Jordan Department of Statistics (DOS)
Time period covered
2010 - 2011
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Households

Individuals

Kind of data

Sample survey data [ssd]

Sampling procedure

The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

Mode of data collection

Face-to-face [f2f]

Research instrument

General form

Expenditure on food commodities form

Expenditure on non-food commodities form

Cleaning operations

Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.
ERA40 T85 Monthly Mean Surface Analysis and Surface Forecast Fields, created...
rda.ucar.edu
oidc.rda.ucar.edu
+2more
Updated May 9, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCAR/NCAR - Research Data Archive (2005). ERA40 T85 Monthly Mean Surface Analysis and Surface Forecast Fields, created at NCAR [Dataset]. http://doi.org/10.5065/D62805PB
Explore at:
Unique identifier
https://doi.org/10.5065/D62805PB
Dataset updated
May 9, 2005
Dataset provided by
University Corporation for Atmospheric Research
Time period covered
Sep 1, 1957 - Aug 31, 2002
Area covered
Description
DS126.0 represents a dataset implemented and computed by NCAR's Data Support Section, and forms an essential part of efforts undertaken in late 2004, early 2005, to produce an archive of selected segments of ERA-40 on a standard transformation grid.

In this case, forty seven ERA-40 monthly mean surface and single level analysis variables were transformed from a reduced N80 Gaussian grid to a 256 by 128 regular Gaussian grid. All fields were transformed using routines from the ECMWF EMOS library, including 10 meter winds which were treated as scalars because of a lack of 10 meter spectral vorticity and divergence. A missing value occurs in the sea surface temperature and sea ice fields to mask grid points occurring over land. Fields formerly archived as whole integers, such as vegetation indices and cloud cover, occur as integers plus a fractional part in the T85 version due to interpolation.

Twenty seven ERA-40 monthly mean surface and single level 6-hour forecast variables were transformed from a reduced N80 Gaussian grid to a 256 by 128 regular Gaussian grid. Four of the variables are "instantaneous" variables, and the remaining twenty three variables are "accumulated" over the 6-hour forecast time. Divide the accumulated variables by 21600 seconds to obtain instantaneous values. (Multiplication by minus one may also be necessary to match the sign convention one is accustomed to.) All fields were transformed using routines from the ECMWF EMOS library, including three pairs of stresses which were treated as scalars because of a lack of spectral precursors.

In addition, all corresponding 00Z, 06Z, 12Z, and 18Z monthly mean surface and single level analysis variables and 6-hour forecast variables were also transformed to a T85 Gaussian grid.

All forecast variables are valid 6 hours after the forecast was initiated. Thus, 00Z 6-hour forecast evaporation is valid at 06Z. Divide the accumulated variables by 21600 seconds to obtain instantaneous values. (Multiplication by minus one may also be necessary to match the sign convention one is to.)

The choice of a T85 Gaussian grid was based on considerations of limiting the volume of new data generated to a moderate level, and to match the horizontal resolution of the Community Atmosphere Model (CAM) [https://www.cesm.ucar.edu/models/atm-cam/] component of NCAR's Community Climate System Model (CCSM).

The ERA-Interim data from ECMWF is an update to the ERA-40 project. The ERA-Interim data starts in 1989 and has a higher horizontal resolution (T255, N128 nominally 0.703125 degrees) than the ERA-40 data (T159, N80 nominally 1.125 degrees). ERA-Interim is based on a more current model than ERA-40 and uses 4DVAR (as opposed to 3DVAR in ERA-40). ECMWF will continue to run the ERA-Interim model in near real time through at least 2010, and possibly longer. This data is available in ds627.0 [https://rda.ucar.edu/datasets/ds627.0/].
ERA5-Land monthly averaged data from 1950 to present
cds.climate.copernicus.eu
catalog.eoxhub.fairicube.eu
grib
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5-Land monthly averaged data from 1950 to present [Dataset]. http://doi.org/10.24381/cds.68d2bb30
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.68d2bb30
Dataset updated
Mar 6, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
Time period covered
Jan 1, 1950 - Feb 1, 2025
Description
ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land has been produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. Reanalysis produces data that goes several decades back in time, providing an accurate description of the climate of the past. ERA5-Land provides a consistent view of the water and energy cycles at surface level during several decades. It contains a detailed record from 1950 onwards, with a temporal resolution of 1 hour. The native spatial resolution of the ERA5-Land reanalysis dataset is 9km on a reduced Gaussian grid (TCo1279). The data in the CDS has been regridded to a regular lat-lon grid of 0.1x0.1 degrees. The data presented here is a post-processed subset of the full ERA5-Land dataset. Monthly-mean averages have been pre-calculated to facilitate many applications requiring easy and fast access to the data, when sub-monthly fields are not required.
BS Filled Total Column Ozone Database V3.5.1
zenodo.org
nc
Updated Dec 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greg E. Bodeker; Greg E. Bodeker; Stefanie Kremser; Stefanie Kremser; Jordis S. Tradowsky; Jordis S. Tradowsky (2022). BS Filled Total Column Ozone Database V3.5.1 [Dataset]. http://doi.org/10.5281/zenodo.4535247
Explore at:
ncAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4535247
Dataset updated
Dec 30, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Greg E. Bodeker; Greg E. Bodeker; Stefanie Kremser; Stefanie Kremser; Jordis S. Tradowsky; Jordis S. Tradowsky
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The BS Filled Total Column Ozone (TCO) database version 3.5.1 provides a gap-free extension of the NIWA-BS Total Column Ozone database version 3.5.1 (doi:10.5281/zenodo.4535293) which combines TCO data from multiple satellite-based instruments to create a single near-global daily time series of ozone fields at 1.25° longitude by 1° latitude spanning the period 31 October 1978 to 31 December 2019. The construction of NIWA-BS TCO was initially maintained by the National Institute of Water and Atmospheric Research (NIWA) and now by Bodeker Scientific (BS). The latter also developed the BS Filled TCO database published here.

While the BS Filled TCO database has the same resolution and covers the same period as NIWA-BS TCO, any missing data have been filled using a machine learning-based method that regresses the NIWA-BS TCO database against NCEP CFSR reanalysis tropopause height fields and potential vorticity (PV) fields on the 550 K surface.

Uncertainties in the filled TCO fields are provided on every datum and these uncertainties reflect the data availability, i.e. the uncertainty is smaller where measurements are available compared to regions where filling of the data was required.

Please note: For the reasons detailed in this document, versions 3.5.x of the BS Filled TCO
database and of the NIWA-BS TCO database should not be used henceforth for trend analysis and, as such, we have updated the version 3.4 of NIWA-BS Combined TCO database to the end of 2019 (now referred to as version 3.4.1 of the database) as a replacement. You will find a link to version 3.4.1 under the link provided above.

The data are available in daily, monthly or annual resolution. Please note the following for the data in annual resolution: There have to be at least 12 valid monthly values within a year to calculate a valid annual mean. Therefore, no annual mean is available for the year 1978 as the record only starts on 31 October 1978.

Please email greg@bodekerscientific.com and let us know which data set you downloaded and what your intended purpose for the use of the data is. You will then receive updates if an improved version becomes available.
c
GLES Cross-Section 2013-2021, Sensitive Regional Data
datacatalogue.cessda.eu
search.gesis.org
+1more
Updated Oct 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Debus, Marc; Faas, Thorsten; Rattinger, Hans; Roßteutscher, Sigrid; Schmitt-Beck, Rüdiger; Schoen, Harald; Weßels, Bernhard; Wolf, Christof (2024). GLES Cross-Section 2013-2021, Sensitive Regional Data [Dataset]. http://doi.org/10.4232/1.14031
Explore at:
Unique identifier
https://doi.org/10.4232/1.14031
Dataset updated
Oct 17, 2024
Dataset provided by
GESIS – Leibniz-Institut für Sozialwissenschaften
Freie Universität Berlin
Universität Frankfurt
Wissenschaftszentrum Berlin für Sozialforschung
Universität Mannheim
Authors
Debus, Marc; Faas, Thorsten; Rattinger, Hans; Roßteutscher, Sigrid; Schmitt-Beck, Rüdiger; Schoen, Harald; Weßels, Bernhard; Wolf, Christof
Time period covered
Jul 29, 2013 - Nov 21, 2021
Area covered
Germany
Measurement technique
Face-to-face interview: Computer-assisted (CAPI/CAMI), Self-administered questionnaire: Paper, Self-administered questionnaire: Web-based (CAWI)
Description
In the dataset ´GLES Cross-Section 2013-2021, Sensitive Regional Data´, the recoded or deleted variables of the GLES Cross-Section Scientific Use Files, which refer to the respondents’ place of residence, are made available for research purposes. The basis for the assignment of the small-scale regional units are the addresses of the respondents. After geocoding, i.e. the calculation of geocoordinates based on the addresses, the point coordinates were linked to regional units (e.g. INSPIRE grid cells, municipality and district ids, postal codes). The regional variables of this dataset can be linked to the survey data of the pre- and post-election cross-sections of the GLES.

This data set contains the following sensitive regional variables (both ids and, if applying, names): - 3-digit key for the adminsitrative governmental district (Regierungsbezirk) (since 2013) - 3-digit key for spatial planning region (since 2013) - 5-digit key for (city-) districts (since 2013) - 9-digit key for municipalities (since 2021) - 8-digit general municipality key (AGS) (since 2013) - 12-digit regional key (Regionalschlüssel) (since 2021) - Zip code (since 2013) - Constituencies (since 2013) - NUTS-3 code (since 2013) - INSPIRE ID (1km) (since 2013) - municipality size (since 2013) - BIK type of municipality (since 2013)

This sensitive data is subject to a special access restriction and can only be used within the scope of an on-site use in the Secure Data Center in Cologne. Further information and contact persons can be found on our website: https://www.gesis.org/en/secdc

In order to take into account changes in the territorial status of the regional units (e. g. district reforms, municipality incorporations), the regional variables are offered as time-harmonized variables as of December 31, 2015 in addition to the status as of January 1 of the year of survey.

If you want to use the regional variables to add additional context characteristics (regional attributes such as unemployment rate or election turnout, for example), you have to send us this data before your visit. In addition, we require a reference and documentation (description of variables) of the data. Note that this context data may be as sensitive as the regional variables if direct assignment is possible. Due to data protection it is problematic if individual characteristics can be assigned to specific regional units – and therefore ultimately to the individual respondents – even without the ALLBUS dataset by means of a table of correspondence. Accordingly, the publication of (descriptive) analysis results based on such contextual data is only possible in a coarsened form.

Please contact the GLES User Service first and send us the filled GLES regional data form (see ´Data & Documents´), specifying exactly which GLES datasets and regional variables you need. Contact: gles@gesis.org

As soon as you have clarified with the GLES user service which exact regional features are to be made available for on-site use, the data use agreement for the use of the data at a guest workstation in our Secure Data Center (Safe Room) in Cologne will be sent to you. Please specify all data sets you need, i.e. both the ´GLES Sensitive Regional Data (ZA6828)´ and the Scientific Use Files to which the regional variables are to be assigned. Furthermore, under ´Specific variables´, please name all the regional variables you need (see GLES regional data form).
Quarterly Labour Force Survey Household Dataset, July - September, 2021
beta.ukdataservice.ac.uk
datacatalogue.cessda.eu
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, July - September, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8876-3
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8876-3
Dataset updated
2023
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Authors
Office For National Statistics
Description
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

Household datasets
Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

Change to coding of missing values for household series
From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance page before commencing analysis.

Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

End User Licence and Secure Access QLFS Household datasets
Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

Changes to variables in QLFS Household EUL datasets
In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

Review of imputation methods for LFS Household data - changes to missing values
A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

Latest edition information
For the third edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
NIWA-BS Total Column Ozone Database V3.4.1
zenodo.org
nc
Updated Feb 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greg E. Bodeker; Greg E. Bodeker; Jan Nitzbon; Jared Lewis; Jared Lewis; Alexander Schwertheim; Jordis S. Tradowsky; Jordis S. Tradowsky; Stefanie Kremser; Stefanie Kremser; Jan Nitzbon; Alexander Schwertheim (2023). NIWA-BS Total Column Ozone Database V3.4.1 [Dataset]. http://doi.org/10.5281/zenodo.7447660
Explore at:
ncAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7447660
Dataset updated
Feb 9, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Greg E. Bodeker; Greg E. Bodeker; Jan Nitzbon; Jared Lewis; Jared Lewis; Alexander Schwertheim; Jordis S. Tradowsky; Jordis S. Tradowsky; Stefanie Kremser; Stefanie Kremser; Jan Nitzbon; Alexander Schwertheim
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Total column ozone (TCO) data from multiple satellite-based instruments have been combined to create a single near-global daily time series of ozone fields at 1.25degree longitude by 1degree latitude spanning the period 31 October 1978 to 31 December 2019. Comparisons against TCO measurements from the ground-based Dobson and Brewer spectrophotometer networks are used to remove offsets and drifts against the ground-based measurements in a subset of the satellite-based instruments. The corrected subset is then used as a basis for homogenizing the remaining data sets. The intention is that this data set serves as a climate data record for TCO and, to this end, the requirements for constructing climate data records, as detailed by GCOS (Global Climate Observing System) have been followed as closely as possible. The construction of this database improves on earlier versions of the database maintained first by the National Institute of Water and Atmospheric Research (NIWA) and now by Bodeker Scientific (BS).

Please note: For the reasons detailed in this document, version 3.5.1 of the NIWA-BS TCO
database should not be used henceforth for trend analysis and, as such, we have updated the version 3.4 database to the end of 2019 (now referred to as version 3.4.1 of the database, available in this record) as a replacement.

This version 3.4.1. is produced using the same method as version 3.4 but the dataset has been extended in time to the end of 2019. This means that all fits were recalculated and, thus, this version is slightly different to version 3.4 in all years. A filled (gap-free) version of version 3.4.1 of this dataset is available under doi:10.5281/zenodo.7447757.

The data are available in daily, monthly or annual resolution. Please note the following for the data in monthly and annual resolution:

Monthly: There have to be at least 25 valid values in a gridbox within a month to calculate a monthly mean.
Annual: There have to be at least 12 valid monthly values within a year to calculate an annual mean.

Please email greg@bodekerscientific.com and let us know which data set you downloaded and what your intended purpose for the use of the data is. You will then receive updates if an improved version becomes available.
ERA40 T85 Monthly Mean Analysis Fields on Model Levels, created at NCAR
rda.ucar.edu
oidc.rda.ucar.edu
+3more
Updated May 4, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCAR/NCAR - Research Data Archive (2005). ERA40 T85 Monthly Mean Analysis Fields on Model Levels, created at NCAR [Dataset]. http://doi.org/10.5065/D6ST7MW3
Explore at:
Unique identifier
https://doi.org/10.5065/D6ST7MW3
Dataset updated
May 4, 2005
Dataset provided by
University Corporation for Atmospheric Research
Time period covered
Sep 1, 1957 - Aug 31, 2002
Area covered
Description
DS126.2 represents a dataset implemented and computed by NCAR's Data Support Section, and forms an essential part of efforts undertaken in late 2004, early 2005, to produce an archive of selected segments of ERA-40 on a standard transformation grid. In this case, ERA-40 monthly mean upper air variables on 60 model levels were transformed from either spherical harmonics (surface geopotential, temperature, vertical pressure velocity, vorticity, logarithm of surface pressure, divergence), or a reduced N80 Gaussian grid (specific humidity, ozone mass mixing ratio, cloud liquid water content, cloud ice water content, cloud cover), to a 256 by 128 regular Gaussian grid at T85 spectral truncation. In addition, horizontal wind components were derived from spectral vorticity and divergence and also archived on a T85 Gaussian grid. All scalar fields were transformed using routines from the ECMWF EMOS library, whereas the horizontal winds were obtained using NCAR's SPHEREPACK library. All corresponding 00Z, 06Z, 12Z, and 18Z monthly mean upper air variables on 60 model levels were also transformed to a T85 Gaussian grid. The choice of a T85 Gaussian grid was based on considerations of limiting the volume of new data generated to a moderate level, and to match the horizontal resolution of the Community Atmosphere Model (CAM) [https://www.cesm.ucar.edu/models/atm-cam/] component of NCAR's Community Climate System Model (CCSM).

The ERA-Interim data from ECMWF is an update to the ERA-40 project. The ERA-Interim data starts in 1989 and has a higher horizontal resolution (T255, N128 nominally 0.703125 degrees) than the ERA-40 data (T159, N80 nominally 1.125 degrees). ERA-Interim is based on a more current model than ERA-40 and uses 4DVAR (as opposed to 3DVAR in ERA-40). ECMWF will continue to run the ERA-Interim model in near real time through at least 2010, and possibly longer. This data is available in ds627.0 [https://rda.ucar.edu/datasets/ds627.0/].
E
CMIP6 derived data sets used in the study of the role of the Southern Ocean...
edmed.seadatanet.org
bodc.ac.uk
nc
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Liverpool, Department of Earth and Ocean Sciences (2023). CMIP6 derived data sets used in the study of the role of the Southern Ocean in the global climate response to carbon emissions [Dataset]. https://edmed.seadatanet.org/report/7303/
Explore at:
ncAvailable download formats
Dataset updated
May 18, 2023
Dataset authored and provided by
University of Liverpool, Department of Earth and Ocean Sciences
License
https://vocab.nerc.ac.uk/collection/L08/current/LI/https://vocab.nerc.ac.uk/collection/L08/current/LI/
Time period covered
Jan 31, 2023 - Present
Area covered
Southern Ocean, World,
Description
The data sources of the dataset are outputs from CMIP6 simulations. The effect of the Southern Ocean on global climate change is assessed using Earth system model projections following an idealised 1% annual rise in atmospheric CO2. The model simulations run over 150 years and were created using the Earth System Grid Federation at the CMIP6 archive (https://esgf-node.llnl.gov/search/cmip6, World Climate Research Programme, 2021). The reported derived data sets are based on the output of sub-set of CMIP6 models, providing all necessary variables for the diagnostics and analysis published in: Williams, R.G., P. Ceppi, V. Roussenov, A. Katavouta and A. Meijers, 2022. The role of the Southern Ocean in the global climate response to carbon emissions. Philosophical Transactions A, Royal Society, in press. The dataset contains 3 types of variables: (1) Time averaged 2D fields: model mean and standard deviation (STD) of the surface warming, ocean heat uptake and storage, radiative response, climate feedback parameter, ocean carbon uptake and storage, cumulative top of the atmosphere heat uptake with examples for 2 models - GFDL-ESM4 and UKESM1-0-LL; (2) Time series of the Sothern Ocean or globally averaged (or globally integrated) variables for each model together with the model mean and STD: surface warming, ocean heat uptake and storage, radiative forcing and radiative response, ocean carbon uptake and storage; (3) Single values for the Sothern Ocean and planetary physical climate feedback parameter and Transient Climate Response to Emissions (TCRE) together with their components. This dataset was created under project Southern Ocean carbon indices and metrics (SARDINE), NERC Grant reference NE/T010657/1 by scientists from University of Liverpool, National Oceanography Centre (Liverpool), Imperial College London and British Antarctic Survey.
Open data
ecmwf.int
application/x-grib
Updated Nov 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2024). Open data [Dataset]. https://www.ecmwf.int/en/forecasts/datasets/open-data
Explore at:
application/x-grib(1 datasets)Available download formats
Dataset updated
Nov 3, 2024
Dataset authored and provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
subject to appropriate attribution.
f
Class variables of the chosen datasets.
plos.figshare.com
xls
Updated Aug 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kritanat Chungnoy; Tanatorn Tanantong; Pokpong Songmuang (2024). Class variables of the chosen datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0305492.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305492.t004
Dataset updated
Aug 29, 2024
Dataset provided by
PLOS ONE
Authors
Kritanat Chungnoy; Tanatorn Tanantong; Pokpong Songmuang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Existing missing value imputation methods focused on imputing the data regarding actual values towards a completion of datasets as an input for machine learning tasks. This work proposes an imputation of missing values towards improvement of accuracy performance for classification. The proposed method was based on bee algorithm and the use of k-nearest neighborhood with linear regression to guide on finding the appropriate solution in prevention of randomness. Among the processes, GINI importance score was utilized in selecting values for imputation. The imputed values thus reflected on improving a discriminative power in classification tasks instead of replicating the actual values from the original dataset. In this study, we evaluated the proposed method against frequently used imputation methods such as k-nearest neighborhood, principal components analysis, nonlinear principal, and component analysis to compare root mean square error results and accuracy of using imputed datasets in a classification task. The experimental results indicated that our proposed method obtained the best accuracy results from all datasets comparing to other methods. In comparison to original dataset, the classification model from imputed datasets yielded 15-25% higher accuracy in class prediction. From analysis, the results showed that feature ranking used in a classification process was affected and lead to noticeably change in informativeness as the imputed data from the proposed method played the role to boost a discriminating power.
d
Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data - HOAPS...
b2find.dkrz.de
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data - HOAPS II - monthly mean - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/316ed3dd-c9b1-5624-966a-ab387d4cd9be
Explore at:
Dataset updated
Oct 24, 2023
Area covered
Hamburg
Description
The new version of the Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data set - HOAPS II - contains improved global fields of precipitation and evaporation over the oceans and all basic state variables needed for the derivation of the turbulent fluxes. Except for the NOAA Pathfinder SST data set, all variables are derived from SSM/I satellite data over the ice free oceans between 1987 and 2002. The earlier HOAPS version was improved and includes now the utilisation of multi-satellite averages with proper inter-satellite calibration, improved algorithms and a new ice detection procedure, resulting in more homogeneous and reliable spatial and temporal fields as before. The spatial resolution of 0.5 degree, makes them ideally suited for studies of climate variability over the global oceans. Pentade and climatological means are also public and available via the CERA database system. Further information under : https://www.cmsaf.eu/EN/Overview/OurProducts/Hoaps/Hoaps_node.html .
4
Data underlying the publication: Retrieving pulsatility in ultrasound...
data.4tu.nl
figshare.com
zip
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Myrthe Wiersma; Baptiste Heiles; Dylan Kalisvaart; David Maresca; Carlas Smith (2022). Data underlying the publication: Retrieving pulsatility in ultrasound localization microscopy [Dataset]. http://doi.org/10.4121/21517878.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/21517878.v1
Dataset updated
Dec 19, 2022
Dataset provided by
4TU.ResearchData
Authors
Myrthe Wiersma; Baptiste Heiles; Dylan Kalisvaart; David Maresca; Carlas Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
European Commission
Description
General information

This simulation data belongs to the article:

Retrieving pulsatility in ultrasound localization microscopy

DOI: 10.1109/OJUFFC.2022.3221354

This information is also available in README.txt, included in this repository.

Code availability

The scripts that should be used to process this data can be found at: https://github.com/qnano/ulm-pulsatility

Data description

The simulation data in this repository, is contained in several .zip files:

DataSetFig3.zip: This dataset contains the simulation data needed to reproduce Fig. 3 of the article.

DataSetR1.zip: This dataset contains the simulation data corresponding to the lateral vessel location R1. It is needed to reproduce Fig 4 and Fig. 6b.

DataSetR2.zip: This dataset contains the simulation data corresponding to the lateral vessel location R2. It is needed to reproduce Fig 4 and Fig. 6b.

DataSetR3.zip: This dataset contains the simulation data corresponding to the lateral vessel location R3. It is needed to reproduce Fig 4 and Fig. 6b.

The .zip folders contain the following:

LOTUS_INPUT: This folder contains the input simulated ultrasound images, of which the file names are parametrized as:

chunk***.mat: Data set containing IQ variable of size 180x240x1000, containing 1000 B-mode images of dimension 9mmx12mm

ULM_results: This folder is initially empty. Results of running the scripts will be stored in this folder.

PAR.mat: Parameters with which the ultrasound images are simulated. See scripts for further details.

TOTAL_MB.mat: Contains the variable MB_loc_conc that describes the simulated (ground truth) location of the MBs at each frame (z, x) [mm].

Saved MB tracks used for visualization. One of the following options:

RAW_tracks.mat

Track_r1_illustration.mat

Track_r2_illustration.mat

Track_r3_illustration.mat

GT_theta.mat: Ground truth orientation of the simulated vessels

velAv.mat: Average velocity in the simulated vessels during the simulation.

Download the desired .zip file and see the documentation at https://github.com/qnano/ulm-pulsatility for instructions on processing the data.
n
ECMWF ERA5t: ensemble means of surface level analysis parameter data
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Jul 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ECMWF ERA5t: ensemble means of surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?format=Data%20are%20netCDF%20formatted%20with%20internal%20compression.
Explore at:
Dataset updated
Jul 28, 2021
Description
This dataset contains ERA5 initial release (ERA5t) surface level analysis parameter data ensemble means (see linked dataset for spreads). ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. The ensemble means and spreads are calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. See linked datasets for ensemble member and spread data. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.

Facebook

Twitter

Click to copy link

Link copied

Cite

Shengyi Wu; Tommy Blanchard; Emily Meschke; Richard Aslin; Ben Hayden; Celeste Kidd (2022). Macaques preferentially attend to intermediately surprising information [Dataset]. http://doi.org/10.6078/D15Q7Q

Data from: Macaques preferentially attend to intermediately surprising information

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.6078/D15Q7Q

Dataset updated

Apr 26, 2022

Dataset provided by

University of Minnesota
University of California, Berkeley
Yale University
Klaviyo

Authors

Shengyi Wu; Tommy Blanchard; Emily Meschke; Richard Aslin; Ben Hayden; Celeste Kidd

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Normative learning theories dictate that we should preferentially attend to informative sources, but only up to the point that our limited learning systems can process their content. Humans, including infants, show this predicted strategic deployment of attention. Here we demonstrate that rhesus monkeys, much like humans, attend to events of moderate surprisingness over both more and less surprising events. They do this in the absence of any specific goal or contingent reward, indicating that the behavioral pattern is spontaneous. We suggest this U-shaped attentional preference represents an evolutionarily preserved strategy for guiding intelligent organisms toward material that is maximally useful for learning. Methods How the data were collected: In this project, we collected gaze data of 5 macaques when they watched sequential visual displays designed to elicit probabilistic expectations using the Eyelink Toolbox and were sampled at 1000 Hz by an infrared eye-monitoring camera system. Dataset:

"csv-combined.csv" is an aggregated dataset that includes one pop-up event per row for all original datasets for each trial. Here are descriptions of each column in the dataset:

subj: subject_ID = {"B":104, "C":102,"H":101,"J":103,"K":203} trialtime: start time of current trial in second trial: current trial number (each trial featured one of 80 possible visual-event sequences)(in order) seq current: sequence number (one of 80 sequences) seq_item: current item number in a seq (in order) active_item: pop-up item (active box) pre_active: prior pop-up item (actve box) {-1: "the first active object in the sequence/ no active object before the currently active object in the sequence"} next_active: next pop-up item (active box) {-1: "the last active object in the sequence/ no active object after the currently active object in the sequence"} firstappear: {0: "not first", 1: "first appear in the seq"} looks_blank: csv: total amount of time look at blank space for current event (ms); csv_timestamp: {1: "look blank at timestamp", 0: "not look blank at timestamp"} looks_offscreen: csv: total amount of time look offscreen for current event (ms); csv_timestamp: {1: "look offscreen at timestamp", 0: "not look offscreen at timestamp"} time till target: time spent to first start looking at the target object (ms) {-1: "never look at the target"} looks target: csv: time spent to look at the target object (ms);csv_timestamp: look at the target or not at current timestamp (1 or 0) look1,2,3: time spent look at each object (ms) location 123X, 123Y: location of each box (location of the three boxes for a given sequence were chosen randomly, but remained static throughout the sequence) item123id: pop-up item ID (remained static throughout a sequence) event time: total time spent for the whole event (pop-up and go back) (ms) eyeposX,Y: eye position at current timestamp

"csv-surprisal-prob.csv" is an output file from Monkilock_Data_Processing.ipynb. Surprisal values for each event were calculated and added to the "csv-combined.csv". Here are descriptions of each additional column:

rt: time till target {-1: "never look at the target"}. In data analysis, we included data that have rt > 0. already_there: {NA: "never look at the target object"}. In data analysis, we included events that are not the first event in a sequence, are not repeats of the previous event, and already_there is not NA. looks_away: {TRUE: "the subject was looking away from the currently active object at this time point", FALSE: "the subject was not looking away from the currently active object at this time point"} prob: the probability of the occurrence of object surprisal: unigram surprisal value bisurprisal: transitional surprisal value std_surprisal: standardized unigram surprisal value std_bisurprisal: standardized transitional surprisal value binned_surprisal_means: the means of unigram surprisal values binned to three groups of evenly spaced intervals according to surprisal values. binned_bisurprisal_means: the means of transitional surprisal values binned to three groups of evenly spaced intervals according to surprisal values.

"csv-surprisal-prob_updated.csv" is a ready-for-analysis dataset generated by Analysis_Code_final.Rmd after standardizing controlled variables, changing data types for categorical variables for analysts, etc. "AllSeq.csv" includes event information of all 80 sequences

Empty Values in Datasets:

There is no missing value in the original dataset "csv-combined.csv". Missing values (marked as NA in datasets) happen in columns "prev_active", "next_active", "already_there", "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" in "csv-surprisal-prob.csv" and "csv-surprisal-prob_updated.csv". NAs in columns "prev_active" and "next_active" mean that the first or the last active object in the sequence/no active object before or after the currently active object in the sequence. When we analyzed the variable "already_there", we eliminated data that their "prev_active" variable is NA. NAs in column "already there" mean that the subject never looks at the target object in the current event. When we analyzed the variable "already there", we eliminated data that their "already_there" variable is NA. Missing values happen in columns "bisurprisal", "std_bisurprisal", "sq_std_bisurprisal" when it is the first event in the sequence and the transitional probability of the event cannot be computed because there's no event happening before in this sequence. When we fitted models for transitional statistics, we eliminated data that their "bisurprisal", "std_bisurprisal", and "sq_std_bisurprisal" are NAs.

Codes:

In "Monkilock_Data_Processing.ipynb", we processed raw fixation data of 5 macaques and explored the relationship between their fixation patterns and the "surprisal" of events in each sequence. We computed the following variables which are necessary for further analysis, modeling, and visualizations in this notebook (see above for details): active_item, pre_active, next_active, firstappear ,looks_blank, looks_offscreen, time till target, looks target, look1,2,3, prob, surprisal, bisurprisal, std_surprisal, std_bisurprisal, binned_surprisal_means, binned_bisurprisal_means. "Analysis_Code_final.Rmd" is the main scripts that we further processed the data, built models, and created visualizations for data. We evaluated the statistical significance of variables using mixed effect linear and logistic regressions with random intercepts. The raw regression models include standardized linear and quadratic surprisal terms as predictors. The controlled regression models include covariate factors, such as whether an object is a repeat, the distance between the current and previous pop up object, trial number. A generalized additive model (GAM) was used to visualize the relationship between the surprisal estimate from the computational model and the behavioral data. "helper-lib.R" includes helper functions used in Analysis_Code_final.Rmd

Clear search

Close search

Google apps

Main menu

Data from: Macaques preferentially attend to intermediately surprising...

Living Standards Survey III 1991-1992 - World Bank SHIP Harmonized Dataset -...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Essential Climate Variables: Sum of monthly precipitation (Copernicus...

Quarterly Labour Force Survey Household Dataset, January - March, 2021

Model evaluation for positive COVID-19 cases.

ERA5 hourly data on single levels from 1940 to present

Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

ERA40 T85 Monthly Mean Surface Analysis and Surface Forecast Fields, created...

ERA5-Land monthly averaged data from 1950 to present

BS Filled Total Column Ozone Database V3.5.1

GLES Cross-Section 2013-2021, Sensitive Regional Data

Quarterly Labour Force Survey Household Dataset, July - September, 2021

NIWA-BS Total Column Ozone Database V3.4.1

ERA40 T85 Monthly Mean Analysis Fields on Model Levels, created at NCAR

CMIP6 derived data sets used in the study of the role of the Southern Ocean...

Open data

Class variables of the chosen datasets.

Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data - HOAPS...

Data underlying the publication: Retrieving pulsatility in ultrasound...

General information

Code availability

Data description

ECMWF ERA5t: ensemble means of surface level analysis parameter data

Data from: Macaques preferentially attend to intermediately surprising information