Facebook
TwitterThere are a number of ways to test for the absence/presence of a spatial signal in a completely observed fine-resolution image. One of these is a powerful nonparametric procedure called enhanced false discovery rate (EFDR). A drawback of EFDR is that it requires the data to be defined on regular pixels in a rectangular spatial domain. Here, we develop an EFDR procedure for possibly incomplete data defined on irregular small areas. Motivated by statistical learning, we use conditional simulation (CS) to condition on the available data and simulate the full rectangular image at its finest resolution many times (M, say). EFDR is then applied to each of these simulations resulting in M estimates of the signal and M statistically dependent p-values. Averaging over these estimates yields a single, combined estimate of a possible signal, but inference is needed to determine whether there really is a signal present. We test the original null hypothesis of no signal by combining the M p-values into a single p-value using copulas and a composite likelihood. If the null hypothesis of no signal is rejected, we use the combined estimate. We call this new procedure EFDR-CS and, to demonstrate its effectiveness, we show results from a simulation study; an experiment where we introduce aggregation and incompleteness into temperature-change data in the Asia-Pacific; and an application to total-column carbon dioxide from satellite remote sensing data over a region of the Middle East, Afghanistan, and the western part of Pakistan. Supplementary materials for this article are available online.
Facebook
TwitterUsing novel survey evidence on consumer inflation expectations disaggregated by personal consumption expenditure (PCE) categories, we document the paradox that consumers' aggregate inflation expectations usually exceed any individual category expectation. We explore procedures for aggregating category inflation expectations, and find that the inconsistency between aggregate and aggregated inflation expectations rises with subjective uncertainty and is systematically related to socioeconomic characteristics. Overall, our results are inconsistent with the notion that consumers' aggregate inflation expectations comprise an expenditure-weighted sum of category beliefs. Moreover, aggregated inflation expectations explain a greater share of planned consumer spending than aggregate inflation expectations.
Facebook
TwitterThe World Values Survey is a worldwide investigation of sociocultural and political change. It is conducted by a network of social scientist at leading universities all around world.
Interviews have been carried out with nationally representative samples of the publics of more than 80 societies on all six inhabited continents. The first wave of the values survey was collected in 198. This was mainly a European endeavor (se EVS). From the second wave the global representation rose dramatically making it possible to carry out reliable global cross-cultural analyses and analysis of changes over time. The World Values Survey has produced evidence of gradual but pervasive changes in what people want out of life. Moreover, the survey shows that the basic direction of these changes is, to some extent, predictable.
Albania, Algeria, Andorra, Argentina, Armenia, Australia, Azerbaijan, Bangladesh, Belarus, Bosnia and Herzegovina, Brazil, Bulgaria, Burkina Faso, Canada, Chile, China, Colombia, Croatia, Cyprus, Czech Republic, Dominican Republic, Egypt, El Salvador, Ethiopia, Estonia, Finland, France, Georgia, Germany, Ghana, Great Britain, Guatemala, Hong Kong, Hungary, India, Indonesia, Iran, Iraq, Israel, Italy, Japan, Jordan, Kyrgyzstan, Latvia, Lithuania, Macedonia, Malaysia, Mali, Mexico, Moldova, Morocco, Netherlands, New Zealand, Nigeria, Norway, Pakistan, Peru, Philippines, Poland, Puerto Rico, Romania, Russian Federation, Rwanda, Saudi Arabia, Serbia, Serbia and Montenegro, Singapore, Slovakia, Slovenia, South Africa, South Korea, Spain,Sweden, Switzerland, Taiwan, China, Tanzania, Thailand, Trinidad and Tobago, Turkey, Uganda, Ukraine, United States, Uruguay, Venezuela, Vietnam, Zambia, Zimbabwe.
individuals
WVS surveys are required to cover all residents (not only citizens) between the ages of 18 and 85, inclusive. PI's can lower the minimum age limit as long as the minimum required sample size for the 18+ population is achieved.
Sample survey data [ssd]
Detailed sample guidlines for each Round as well as each country can be obtained from here:
http://www.wvsevsdb.com/wvs/WVSTechnical.jsp?Idioma=I
General Guidlines:
The preferred method of sampling for WVS surveys is the full probability sample. However, recognizing that the very high cost -in terms of finances, manpower and time- of full probability samples may prove to be prohibitive in some cases, WVS allows quota sampling provided that the following principles are strictly adhered to:
Selection of first stage clusters within PSUs must be probabilistic (and preferably PPS).
Quota sampling should be used only within reasonably small sized clusters that have been selected probabilistically.
Whether the sampling method is full probability or a combination of probability and quota, the minimum number of PSUs is 30. A design with less than 30 PSUs is not permissible.
B. SAMPLE SIZE The minimum sample size (i.e. the number of completed interviews) is 1,000. However, given the fact that in most designs the "effective sample size" (sample size net of design effects) is lower than the actual sample size, larger sample sizes are strongly recommended if at all possible.
C. NON-RESPONSE Non-response is an issue of increasing concern in sample surveys. Investigators are expected to make every reasonable effort to minimize non-response.
More specifically, 1. In countries using a full probability design, no replacements are allowed. PIs should plan on as many call-backs as the funding will allow. 2. In countries using some form of quota sampling, every effort should be made to interview the first contact. In any case, and as indicated below, a full report on non-responses is required.
Face-to-face [f2f]
Facebook
TwitterPublic Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The Western and Central Fisheries Commission (WCPFC) have compiled a public domain version of aggregated catch and effort data using operational, aggregate and annual catch estimates data provided by Commission Members (CCMs) and Cooperating Non-members (CNMs). The data provided herein have been prepared for dissemination in accordance with the current “Rules and Procedures for the Protection, Access to, and Dissemination of Data Compiled by the Commission” or (“RAP”).
Paragraph 9 of the Rules and Procedures indicates that "Catch and Effort data in the public domain shall be made up of observations from a minimum of three vessels". However, the majority of aggregate data provided to WPCFC do not indicate how many vessels were active in each cell of data which would allow data to be directly filtered according to this rule. Instead, the individual cells where "effort" is less than or equal to the maximum value estimated to represent the activities of two vessels have been removed from the public domain data (the cells are retained with their time/area information, but all catch and effort information in these have been set to zero). Statistics showing how much data have been removed according to this RAP requirement are provided in the documentation for the longline and purse seine public domain data.
All public domain data have been aggregated by year/month and 5°x5° grid. Annex 2 of the RAP indicates that public domain aggregated catch/effort data can be made available at a higher resolution (e.g. data with a breakdown by vessel nation, and aggregated by 1°x1° grids for surface fisheries); however, if the public domain data were provided at these higher levels of resolution implementation of the RAP "three-vessel rule" with the current aggregate data set would result in too many cells being removed.
However, please note that the data that have been removed from the public domain dataset, available on this webpage, are still potentially accessible via other provisions of the RAP (refer to section 4.6 and para 34).
Each public domain zip file contains two files: (1) a CSV file containing the data; (2) a PDF file containing the field names/formats and the coverage with respect to the data file.
These data files were last updated on the 27th July 2020.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We consider the problem of identifying skilled funds among a large number of candidates under the linear factor pricing models containing both observable and latent market factors. Motivated by the existence of non-strong potential factors and diversity of error distribution types of the linear factor pricing models, we develop a distribution-free multiple testing procedure to solve this problem. The proposed procedure is established based on the statistical tool of symmetrized data aggregation, which makes it robust to the strength of potential factors and distribution type of the error terms. We then establish the asymptotic validity of the proposed procedure in terms of both the false discovery rate and true discovery proportion under some mild regularity conditions. Furthermore, we demonstrate the advantages of the proposed procedure over some existing methods through extensive Monte Carlo experiments. In an empirical application, we illustrate the practical utility of the proposed procedure in the context of selecting skilled funds, which clearly has much more satisfactory performance than its main competitors.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Subdatasets: Long-term data: 2000-2021 5th percentile (p05) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 50th percentile (p50) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 95th percentile (p95) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 General Description The monthly aggregated Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) dataset is derived from 250m 8d GLASS V6 FAPAR. The data set is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance and LAI data using several other FAPAR products (MODIS Collection 6, GLASS FAPAR V5, and PROBA-V1 FAPAR) to generate a bidirectional long-short-term memory (Bi-LSTM) model to estimate FAPAR. The dataset time spans from March 2000 to December 2021 and provides data that covers the entire globe. The dataset can be used in many applications like land degradation modeling, land productivity mapping, and land potential mapping. The dataset includes: Long-term: Derived from monthly time-series. This dataset provides linear trend model for the p95 variable: (1) slope beta mean (p95.beta_m), p-value for beta (p95.beta_pv), intercept alpha mean (p95.alpha_m), p-value for alpha (p95.alpha_pv), and coefficient of determination R2 (p95.r2_m). Monthly time-series: Monthly aggregation with three standard statistics: (1) 5th percentile (p05), median (p50), and 95th percentile (p95). For each month, we aggregate all composites within that month plus one composite each before and after, ending up with 5 to 6 composites for a single month depending on the number of images within that month. Data Details Time period: March 2000 – December 2021 Type of data: Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) How the data was collected or derived: Derived from 250m 8 d GLASS V6 FAPAR using Python running in a local HPC. The time-series analysis were computed using the Scikit-map Python package. Statistical methods used: for the long-term, Ordinary Least Square (OLS) of p95 monthly variable; for the monthly time-series, percentiles 05, 50, and 95. Limitations or exclusions in the data: The dataset does not include data for Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180.00000, -62.0008094, 179.9999424, 87.37000) Spatial resolution: 1/480 d.d. = 0.00208333 (250m) Image size: 172,800 x 71,698 File format: Cloud Optimized Geotiff (COG) format. Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue: https://github.com/Open-Earth-Monitor/Global_FAPAR_250m/issues Reference Hackländer, J., Parente, L., Ho, Y.-F., Hengl, T., Simoes, R., Consoli, D., Şahin, M., Tian, X., Herold, M., Jung, M., Duveiller, G., Weynants, M., Wheeler, I., (2023?) "Land potential assessment and trend-analysis using 2000–2021 FAPAR monthly time-series at 250 m spatial resolution", submitted to PeerJ, preprint available at: https://doi.org/10.21203/rs.3.rs-3415685/v1 Name convention To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are: generic variable name: fapar = Fraction of Absorbed Photosynthetically Active Radiation variable procedure combination: essd.lstm = Earth System Science Data with bidirectional long short-term memory (Bi–LSTM) Position in the probability distribution / variable type: p05/p50/p95 = 5th/50th/95th percentile Spatial support: 250m Depth reference: s = surface Time reference begin time: 20000301 = 2000-03-01 Time reference end time: 20211231 = 2022-12-31 Bounding box: go = global (without Antarctica) EPSG code: epsg.4326 = EPSG:4326 Version code: v20230628 = 2023-06-28 (creation date)
Facebook
TwitterThe goal of this study was to understand the service delivery chain of youth health services in Lithuania and, more generally, the needs of and challenges faced by public servants. This survey aims to better identify the management, work environment, and attitudinal factors that influence service delivery, and to identify actionable reforms that could be undertaken in the next one to three years at relatively low cost. The findings of this study is used to design and implement measures to make the civil service and youth health policies in Lithuania are better managed, and more effective in achieving its goals. It also informs research on how civil services work around the world and how the challenges civil servants face can be best overcome.
956 public servants across ministries and agencies.
Public servants
Aggregate data [agg]
For the survey, 3 ministries were selected based on the research topics as ministries whose work is closely related to the provision of mental health services to young people i.e. the focus of the project and their work in the field of educational assistance. Other ministries were selected at random from the other remaining ministries to interview employees not directly involved in mental health.
The selection of employees was done using Stata. In each selected ministry and agency, we select 40 employees to sample. First, we randomly select 5 units. Then we pick a manager and up to 10 employees from each unit. If this overfills the sample, we drop the corresponding number of employees from the largest unit. In case of a tie, we drop from lower rank units (i.e. if unit 3 and unit 4 are tied, we drop from unit 4). If this does not fill the intended sample, we select random employees from other departments to fill the sample. We assign any left-over employees in picked units to the back-up sample. If this does not fill the back-up sample, we pick back-ups from other units, until we have at least 5 managerial level employees and 35 regular employees in the back up sample.
We also selected 40 out of 60 municipalities from Lithuania for the survey. According to municipality size, we sample 12 to 22 employees. First, we pick one random education unit in each municipality and sample its’ manager and up to 7 employees. Then, we pick one random non-education unit in each municipality. We again sample its’ manager and up to 7 employees. If this overfills the sample, we drop from the corresponding number of employees from the largest unit. In case of a tie, we drop from the non-education unit.
We also selected 40 out of 48 public health offices from all over Lithuania for the survey. Aside from one public health office that was selected for its importance in public health, the remaining 39 of the 47 public health offices were selected at random. In each public health office, we selected all employees to participate in the survey, except for specialists who directly provide public health services in schools.
Computer Assisted Personal Interview [capi]
The survey questionnaire comprises following modules: 1- Organization/ individual identifiers 2- Email information 3- Demographics 4- Stigma 5- Mental health budgeting 6- National mental health 7- Co-production 8- Selection 9- Performance management 10- Advancement 11- Rewards 12- Dismissals 13- Attitude and motivation 14- Incentives 15- Teamwork 16- Bottlenecks and capacity building 17- Adapting to the post-COVID-19 era 18- Management practices ask respondents according to specification 19- Targeting 20- Incentives/ monitoring 21- Autonomy: roles, flexibility 22- Staff involvement/ contribution 23- Incentives/ monitoring: performance incentives 24- Staffing 25- Conclusion
Questionnaire in English is attached.
Response rate was 82%.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Modality-agnostic files were copied over and the CHANGES file was updated. Data was aggregated using:
python phenotype.py aggregate subject -i segregated_subject -o aggregated_subject
phenotype.py came from the GitHub repository: https://github.com/ericearl/bids-phenotype
A comprehensive clinical, MRI, and MEG collection characterizing healthy research volunteers collected at the National Institute of Mental Health (NIMH) Intramural Research Program (IRP) in Bethesda, Maryland using medical and mental health assessments, diagnostic and dimensional measures of mental health, cognitive and neuropsychological functioning, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).
In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, blood samples of healthy volunteers are banked for future analyses. All data collected in this protocol are broadly shared here, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unique in its depth of characterization of a healthy population in terms of brain health and will contribute to a wide array of secondary investigations of non-clinical and clinical research questions.
This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.
Inclusion criteria for the study require that participants are adults at or over 18 years of age in good health with the ability to read, speak, understand, and provide consent in English. All participants provided electronic informed consent for online screening and written informed consent for all other procedures. Exclusion criteria include:
Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.
All potential volunteers first visit the study website (https://nimhresearchvolunteer.ctss.nih.gov), check a box indicating consent, and complete preliminary self-report screening questionnaires. The study website is HIPAA compliant and therefore does not collect PII ; instead, participants are instructed to contact the study team to provide their identity and contact information. The questionnaires include demographics, clinical history including medications, disability status (WHODAS 2.0), mental health symptoms (modified DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure), substance use survey (DSM-5 Level 2), alcohol use (AUDIT), handedness (Edinburgh Handedness Inventory), and perceived health ratings. At the conclusion of the questionnaires, participants are again prompted to send an email to the study team. Survey results, supplemented by NIH medical records review (if present), are reviewed by the study team, who determine if the participant is likely eligible for the protocol. These participants are then scheduled for an in-person assessment. Follow-up phone screenings were also used to determine if participants were eligible for in-person screening.
At this visit, participants undergo a comprehensive clinical evaluation to determine final eligibility to be included as a healthy research volunteer. The mental health evaluation consists of a psychiatric diagnostic interview (Structured Clinical Interview for DSM-5 Disorders (SCID-5), along with self-report surveys of mood (Beck Depression Inventory-II (BD-II) and anxiety (Beck Anxiety Inventory, BAI) symptoms. An intelligence quotient (IQ) estimation is determined with the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The KBIT-2 is a brief (20-30 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.
Medical evaluation includes medical history elicitation and systematic review of systems. Biological and physiological measures include vital signs (blood pressure, pulse), as well as weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), C-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, blood samples that can be used for future genomic analysis, development of lymphoblastic cell lines or other biomarker measures are collected and banked with the NIMH Repository and Genomics Resource (Infinity BiologiX). The Family Interview for Genetic Studies (FIGS) was later added to the assessment in order to provide better pedigree information; the Adverse Childhood Events (ACEs) survey was also added to better characterize potential risk factors for psychopathology. The entirety of the in-person assessment not only collects information relevant for eligibility determination, but it also provides a comprehensive set of standardized clinical measures of volunteer health that can be used for secondary research.
Participants are given the option to consent for a magnetic resonance imaging (MRI) scan, which can serve as a baseline clinical scan to determine normative brain structure, and also as a research scan with the addition of functional sequences (resting state and diffusion tensor imaging). The MR protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:
At the time of the MRI scan, volunteers are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks include:
An optional MEG study was added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system (CTF MEG, Coquiltam BC, Canada). The position of the head was localized at the beginning and end of each recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For 48 participants (as of 2/1/2022), photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants (n=16 as of 2/1/2022), a Brainsight neuronavigation system (Rogue Research, Montréal, Québec, Canada) was used to coregister the MRI and fiducial localizer coils in realtime prior to MEG data acquisition.
Online and In-person behavioral and clinical measures, along with the corresponding phenotype file name, sorted first by measurement location and then by file name.
| Location | Measure | File Name |
|---|---|---|
| Online | Alcohol Use Disorders Identification Test (AUDIT) | audit |
| Demographics | demographics | |
| DSM-5 Level 2 Substance Use - Adult | drug_use | |
| Edinburgh Handedness Inventory (EHI) | ehi | |
| Health History Form | health_history_questions | |
| Perceived Health Rating - self | health_rating | |
Facebook
TwitterBackgroundMultilevel analyses are ideally suited to assess the effects of ecological (higher level) and individual (lower level) exposure variables simultaneously. In applying such analyses to measures of ecologies in epidemiological studies, individual variables are usually aggregated into the higher level unit. Typically, the aggregated measure includes responses of every individual belonging to that group (i.e. it constitutes a self-included measure). More recently, researchers have developed an aggregate measure which excludes the response of the individual to whom the aggregate measure is linked (i.e. a self-excluded measure). In this study, we clarify the substantive and technical properties of these two measures when they are used as exposures in multilevel models. MethodsAlthough the differences between the two aggregated measures are mathematically subtle, distinguishing between them is important in terms of the specific scientific questions to be addressed. We then show how these measures can be used in two distinct types of multilevel models—self-included model and self-excluded model—and interpret the parameters in each model by imposing hypothetical interventions. The concept is tested on empirical data of workplace social capital and employees' systolic blood pressure. ResultsResearchers assume group-level interventions when using a self-included model, and individual-level interventions when using a self-excluded model. Analytical re-parameterizations of these two models highlight their differences in parameter interpretation. Cluster-mean centered self-included models enable researchers to decompose the collective effect into its within- and between-group components. The benefit of cluster-mean centering procedure is further discussed in terms of hypothetical interventions. ConclusionsWhen investigating the potential roles of aggregated variables, researchers should carefully explore which type of model—self-included or self-excluded—is suitable for a given situation, particularly when group sizes are relatively small.
Facebook
TwitterThis dataset is a spatially reaggregated version of the original national Africover multipurpose database. The original full resolution land cover has been produced from visual interpretation of digitally enhanced LANDSAT TM images (Bands 4,3,2) acquired mainly in the year 1999. The data was aggregated by eliminating polygons below a certain area threshold to give priority to the classes belonging to Agriculture. This threshold corresponds to approx. a 30 % reduction in the polygon count. The dataset was then re-aggregated based on area threshold values. For more information on the area thresholds used to spatially aggregate the land cover data, please see the 'spatial-agg-procedure' document included in the zip file available here for download. The land cover classes have been developed using the FAO/UNEP international standard LCCS classification system. The data set is intended for free public access. The shape main attributes correspond to the following fields: -ID -HECTARES -USERLABEL -LCCCODE (unique LCCS code) -CODE1 -CODE2 -CODE3 -LC You can download a zip archive containing: -the dataset er-spatial-agg (.shp) -the Eritrea Classifiers Used (.pdf) -the Eritrea legend (.pdf and .xls) -the Eritrea Legend - LCCS Import file (.xls) -the spatial-agg-procedure (.pdf) -the Userlabel Definitions (.pdf) Note: the document Eritrea Classifiers Used.pdf, is a list of all the LCCS classifiers used in the study area. They are grouped under the 8 major land cover types. In addition to the standard classifiers contained in LCCS the user may find “user defined†classifiers used by the map producer to add additional information to a specific class, not available in LCCS. The user-defined attributes are always coded with the letter “Z†.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
General Description The monthly aggregated Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) dataset is derived from 250m 8d GLASS V6 FAPAR. The data set is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance and LAI data using several other FAPAR products (MODIS Collection 6, GLASS FAPAR V5, and PROBA-V1 FAPAR) to generate a bidirectional long-short-term memory (Bi-LSTM) model to estimate FAPAR. The dataset time spans from March 2000 to December 2021 and provides data that covers the entire globe. The dataset can be used in many applications like land degradation modeling, land productivity mapping, and land potential mapping. The dataset includes: Long-term: Derived from monthly time-series. This dataset provides linear trend model for the p95 variable: (1) slope beta mean (p95.beta_m), p-value for beta (p95.beta_pv), intercept alpha mean (p95.alpha_m), p-value for alpha (p95.alpha_pv), and coefficient of determination R2 (p95.r2_m). Monthly time-series: Monthly aggregation with three standard statistics: (1) 5th percentile (p05), median (p50), and 95th percentile (p95). For each month, we aggregate images inside the months and one image before and after, about 5 to 6 images for a single month depending on the number of images inside the month. Data Details Time period: March 2000 – December 2021 Type of data: Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) How the data was collected or derived: Derived from 250m 8 d GLASS V6 FAPAR using Python running in a local HPC. Cloudy pixels were removed and only positive values of water vapor were considered to compute the statistics. The time-series gap-filling and time-series analysis were computed using the Scikit-map Python package. Statistical methods used: for the long-term, trend analysis of p95 monthly variable; for the monthly time-series, percentiles 05, 50, and 95. Limitations or exclusions in the data: The dataset does not include data for Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180.00000, -62.0008094, 179.9999424, 87.37000) Spatial resolution: 1/480 d.d. = 0.00208333 (250m) Image size: 172,800 x 71,698 File format: Cloud Optimized Geotiff (COG) format. Support If you discover a bug, artifact, or inconsistency, or if you have a question please use some of the following channels: Technical issues and questions about the code: GitLab Issues General questions and comments: LandGIS Forum Name convention To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are: generic variable name: fapar = Fraction of Absorbed Photosynthetically Active Radiation variable procedure combination: essd.lstm = Earth System Science Data with bidirectional long short-term memory (Bi–LSTM) Position in the probability distribution / variable type: p05/p50/p95 = 5th/50th/95th percentile Spatial support: 250m Depth reference: s = surface Time reference begin time: 20000301 = 2000-03-01 Time reference end time: 20211231 = 2022-12-31 Bounding box: go = global (without Antarctica) EPSG code: epsg.4326 = EPSG:4326 Version code: v20230628 = 2023-06-28 (creation date)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a value-added product based on 'Up-to-date air quality station measurements', administered by the European Environmental Agency (EEA) and collected by its member states. The original hourly measurement data (NO2, SO2, O3, PM10, PM2.5 in µg/m³) was reshaped, gapfilled and aggregated to different temporal resolutions, making it ready to use in time series analysis or spatial interpolation tasks.
Reproducible code for accessing and processing this data and notebooks for demonstration can be found on Github.
Hourly data was retrieved through the API of the EEA Air Quality Download Service. Measurements (single files per station and pollutant) were joined to create a single time series per station with observations for multiple pollutants. As PM2.5 data is sparse but correlates well with PM10, gapfilling was performed according to methods described in Horálek et al., 2023¹. Validity and verification flags from the original data were passed on for quality filtering. Reproducible computational notebooks using the R programming language are available for the data access and the gapfilling procedure.
Data was aggregated to three coarser temporal resolutions: day, month, and year. Coverage (ratio of non-missing value) was calculated for each pollutant and temporal increment. A threshold of 75% was applied to generate reliable aggregates. All pollutants were aggregated by their aritmethic mean. Additionally, two pollutants were aggregated using a percentile method, which has shown to be more appropriate for mapping applications. PM10 was summarized using the 90.41th percentile. Daily O3 was further summarized as the maximum of the 8-hour running mean. Based thereon, monthly and annual O3 was aggregated using the 93.15th percentile of the daily maxima. For more details refer to the reproducible computational notebook on temporal aggregation.
| column | hourly | daily | monthly | annual | description |
| Air.Quality.Station.EoI.Code | x | x | x | x | Unique station ID |
| Countrycode | x | x | x | x | Two-letter ISO country code |
| Start | x | Start time of (hourly) measurement period | |||
| x | x | x | x | One of NO2; SO2; O3; O3_max8h_93.15; PM10; PM10_90.41; PM2.5 in µg/m³ | |
| Validity_ | x | Validity flag of the respective pollutant | |||
| Verification_ | x | Verification flag of the respective pollutant | |||
| filled_PM2.5 | x | Flag indicating if PM2.5 value is measured or supplemented through gapfilling (boolean) | |||
| year | x | x | x | Year (2015-2023) | |
| cov.year_ | x | x | Data coverage throughout the year (0-1) | ||
| month | x | x | Month (1-12) | ||
| cov.month_ | x | x | Data coverage throughout the month (0-1) | ||
| doy | x | Day of year (0-366) | |||
| cov.day_ | x | Data coverage throughout the day (0-1) |
To avoid redundant information and optimize file size, some relevant meta data is not stored in the air quality data tables, but rather seperately (in a file named "EEA_stations_meta_table.parquet"). This includes type and area of measurement stations, as well as their coordinates.
| column | description |
| Air.Quality.Station.EoI.Code | Unique station ID (required for join) |
| Countrycode | Two-letter ISO country code |
| Station.Type | One of "background", "industrial", or "traffic" |
| Station.Area | One of "urban", "suburban", "rural", "rural-nearcity", "rural-regional", "rural-remote" |
| Longitude & Latitude | Geographic coordinates of the station |
This dataset is shipped in [Parquet files. Hourly and aggregated data are distributed in four individual datasets. Daily and hourly data are partitioned by `Countrycode` (one file per country) to enable reading smaller subsets. Monthly and annual data files are small (> 20Mb) and stored in a single file each. Parquet is a relatively new and very memory-efficient format, that differs from traditional tabular file formats (e.g. CSV) in the sense that it is binary and cannot be opened and displayed by common tabular software (e.g. MS Excel, Libre Office, etc.). Users rather have to use an Apache Arrow implementation, for example in Python, R, C++, or another scripting language. Reading the data there is straight forward (click to see the code samples below).
R code:
# required librarieslibrary(arrow)library(dplyr)# read air quality and meta dataaq = read_parquet("airquality.no2.o3.so2.pm10.pm2p5_4.annual_pnt_20150101_20231231_eu_epsg.3035_v20240718.parquet") meta = read_parquet("EEA_stations_meta_table.parquet")
# join the two for further analysisaq_meta = inner_join(aq, meta, by = join_by(Air.Quality.Station.EoI.Code))
Python code: # required librariesimport pandas as pd
# read air quality and meta dataaq = pd.read_parquet("airquality.no2.o3.so2.pm10.pm2p5_4.annual_pnt_20150101_20231231_eu_epsg.3035_v20240718.parquet") meta = pd.read_parquet("EEA_stations_meta_table.parquet")
# join the two for further analysisaq_meta = aq.merge(meta,on = ["Air.Quality.Station.EoI.Code", "Countrycode"])
Facebook
TwitterThe intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).
The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.
The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.
Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).
A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.
National Coverage
The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.
SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.
It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.
The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).
Sample survey data [ssd]
-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)
WILL CONFIRM LATER!!
OSO LE MEA E LE FAASA...AEA :-)
Mail Questionnaire [mail]
Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.
Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.
Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.
NOT APPLICABLE!!
Facebook
TwitterThe analysis of time-dependent data is an important problem in many application domains, and interactive visualization of time-series data can help in understanding patterns in large time series data. Many effective approaches already exist for visual analysis of univariate time series supporting tasks such as assessment of data quality, detection of outliers, or identification of periodically or frequently occurring patterns. However, much fewer approaches exist which support multivariate time series. The existence of multiple values per time stamp makes the analysis task per se harder, and existing visualization techniques often do not scale well. We introduce an approach for visual analysis of large multivariate time-dependent data, based on the idea of projecting multivariate measurements to a 2D display, visualizing the time dimension by trajectories. We use visual data aggregation metaphors based on grouping of similar data elements to scale with multivariate time series. Aggregation procedures can either be based on statistical properties of the data or on data clustering routines. Appropriately defined user controls allow to navigate and explore the data and interactively steer the parameters of the data aggregation to enhance data analysis. We present an implementation of our approach and apply it on a comprehensive data set from the field of earth bservation, demonstrating the applicability and usefulness of our approach.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Phylogenetic regression is frequently utilized in macroevolutionary studies, and its statistical properties have been thoroughly investigated. By contrast, phylogenetic ANOVA has received relatively less attention, and the conditions leading to incorrect statistical and biological inferences when comparing multivariate phenotypes among groups remains under-explored. Here we propose a refined method of randomizing residuals in a permutation procedure (RRPP) for evaluating phenotypic differences among groups while conditioning the data on the phylogeny. We show that RRPP displays appropriate statistical properties for both phylogenetic ANOVA and regression models, and for univariate and multivariate datasets. For ANOVA, we find that RRPP exhibits higher statistical power than methods utilizing phylogenetic simulation. Additionally, we investigate how group dispersion across the phylogeny affects inferences, and reveal that highly aggregated groups generate strong and significant correlations with the phylogeny, which reduce statistical power and subsequently affect biological interpretations. We discuss the broader implications of this phylogenetic group aggregation, and its relation to challenges encountered with other comparative methods where one or a few transitions in discrete traits are observed on the phylogeny. Finally, we recommend that phylogenetic comparative studies of continuous trait data utilize RRPP for assessing the significance of indicator variables as sources of trait variation.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Subdatasets:
Long-term data: 2000-2021
5th percentile (p05) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
50th percentile (p50) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
95th percentile (p95) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
General Description
The monthly aggregated Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) dataset is derived from 250m 8d GLASS V6 FAPAR. The data set is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance and LAI data using several other FAPAR products (MODIS Collection 6, GLASS FAPAR V5, and PROBA-V1 FAPAR) to generate a bidirectional long-short-term memory (Bi-LSTM) model to estimate FAPAR. The dataset time spans from March 2000 to December 2021 and provides data that covers the entire globe. The dataset can be used in many applications like land degradation modeling, land productivity mapping, and land potential mapping. The dataset includes:
Long-term:
Derived from monthly time-series. This dataset provides linear trend model for the p95 variable: (1) slope beta mean (p95.beta_m), p-value for beta (p95.beta_pv), intercept alpha mean (p95.alpha_m), p-value for alpha (p95.alpha_pv), and coefficient of determination R2 (p95.r2_m).
Monthly time-series:
Monthly aggregation with three standard statistics: (1) 5th percentile (p05), median (p50), and 95th percentile (p95). For each month, we aggregate all composites within that month plus one composite each before and after, ending up with 5 to 6 composites for a single month depending on the number of images within that month.
Data Details
Time period: March 2000 – December 2021
Type of data: Fraction of Absorbed Photosynthetically Active Radiation (FAPAR)
How the data was collected or derived: Derived from 250m 8 d GLASS V6 FAPAR using Python running in a local HPC. The time-series analysis were computed using the Scikit-map Python package.
Statistical methods used: for the long-term, Ordinary Least Square (OLS) of p95 monthly variable; for the monthly time-series, percentiles 05, 50, and 95.
Limitations or exclusions in the data: The dataset does not include data for Antarctica.
Coordinate reference system: EPSG:4326
Bounding box (Xmin, Ymin, Xmax, Ymax): (-180.00000, -62.0008094, 179.9999424, 87.37000)
Spatial resolution: 1/480 d.d. = 0.00208333 (250m)
Image size: 172,800 x 71,698
File format: Cloud Optimized Geotiff (COG) format.
Support
If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue: https://github.com/Open-Earth-Monitor/Global_FAPAR_250m/issues
Reference
Hackländer, J., Parente, L., Ho, Y.-F., Hengl, T., Simoes, R., Consoli, D., Şahin, M., Tian, X., Herold, M., Jung, M., Duveiller, G., Weynants, M., Wheeler, I., (2023?) "Land potential assessment and trend-analysis using 2000–2021 FAPAR monthly time-series at 250 m spatial resolution", submitted to PeerJ, preprint available at: https://doi.org/10.21203/rs.3.rs-3415685/v1
Name convention
To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are:
generic variable name: fapar = Fraction of Absorbed Photosynthetically Active Radiation
variable procedure combination: essd.lstm = Earth System Science Data with bidirectional long short-term memory (Bi–LSTM)
Position in the probability distribution / variable type: p05/p50/p95 = 5th/50th/95th percentile
Spatial support: 250m
Depth reference: s = surface
Time reference begin time: 20000301 = 2000-03-01
Time reference end time: 20211231 = 2022-12-31
Bounding box: go = global (without Antarctica)
EPSG code: epsg.4326 = EPSG:4326
Version code: v20230628 = 2023-06-28 (creation date)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mobile phone location data is a newly emerging data source of great potential to support human mobility research. However, recent studies have indicated that many users can be easily re-identified based on their unique activity patterns. Privacy protection procedures will usually change the original data and cause a loss of data utility for analysis purposes. Therefore, the need for detailed data for activity analysis while avoiding potential privacy risks presents a challenge. The aim of this study is to reveal the re-identification risks from a Chinese city’s mobile users and to examine the quantitative relationship between re-identification risk and data utility for an aggregated mobility analysis. The first step is to apply two reported attack models, the top N locations and the spatio-temporal points, to evaluate the re-identification risks in Shenzhen City, a metropolis in China. A spatial generalization approach to protecting privacy is then proposed and implemented, and spatially aggregated analysis is used to assess the loss of data utility after privacy protection. The results demonstrate that the re-identification risks in Shenzhen City are clearly different from those in regions reported in Western countries, which prove the spatial heterogeneity of re-identification risks in mobile phone location data. A uniform mathematical relationship has also been found between re-identification risk (x) and data (y) utility for both attack models: y = -axb+c, (a, b, c>0; 0
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450992https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450992
Abstract (en): The AIDS Drug Assistance Program (ADAP) Data Report (ADR) includes two components: the Grantee Report and the Client Report. All ADAPs are required to submit both reports. The Grantee Report is a collection of basic information about the grantee characteristics and policies. It includes a Programmatic Summary section and an Annual Submission section. The Client Report (or client-level data) is a collection of one record for each client enrolled in the ADAP. Each record includes the client's encrypted unique identifier, basic demographic data, and enrollment and certification information. A client's record may also include data about the ADAP-funded insurance and medication received, including the costs of these services, as well as HIV clinical information. The HIV/AIDS Bureau (HAB) currently requires that all ADAPs report aggregate data quarterly using the ADAP Quarterly Report (AQR). However, aggregate data limits HAB's ability to respond to inquiries from Congress and other stakeholders regarding the ADAP program. To address this limitation, HAB has developed a new data reporting system, the ADAP Data Report (ADR). The ADR will enable HAB to evaluate the impact of the ADAP program on a national level. The ADR will allow HAB to characterize the individuals using the program, describe the ADAP-funded services being used, and delineate the costs associated with these services. ADAPs will begin collecting data for the ADR in October 2012. However, because the ADR is new, grantees will continue to submit the AQR until they become accustomed to the ADR and the quality of the information provided through the ADR accurately represents the program. At that time, the AQR will be retired. HAB's goal is to have a client-level data reporting system that provides data on the characteristics of the ADAPs and the clients served with program funds. The ADAP client-level data submitted will be used to: - Monitor the clinical outcomes of clients receiving care and treatment through ADAP; - Monitor the use of ADAP funds for appropriately addressing the HIV/AIDS epidemic in the United States; - Monitor the support provided by ADAP to the most vulnerable, especially minority communities; - Address the needs and concerns of Congress and the Department of Health and Human Services (HHS) concerning the HIV/AIDS epidemic and the RWHAP; and - Monitor the outcomes achieved in response to the National HIV/AIDS Strategy. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.. An ADAP client is any individual who is enrolled in the ADAP, i.e., certified as eligible to receive ADAP services, regardless of whether the individual used ADAP services during the reporting period. During the reporting period, an ADAP client may have: - Received medications and/or insurance assistance; - Been placed on the waiting list; - Been dis-enrolled; or - Been eligible, but not received services for clinical or other reasons. Smallest Geographic Unit: ZIP Code 2013-10-03 This study includes information regarding ADAP grantees. The corresponding client level study is ICPSR 34723. The original study, ICPSR 34723, included client and grantee information. Funding insitution(s): United States Department of Health and Human Services. Health Resources and Services Administration.
Facebook
TwitterAfrint intensification of food crops agriculture in sub-Saharan Africa Swedish-African Research Network Agricultural development and its relation to food security and poverty alleviation Primary research in nine sub-Saharan African countries. Afrint - three phases 200I-2016.
Afrint I - 2001-2005: The African Food Crisis - the Relevance of Asian Experiences
Afrint II - 2007-2010:The Millennium Development Goals and the African Food Crisis
Sub-Saharan Africa, (Ethiopia, Ghana, Keny, Malawi, Nigeria, Tanzania, Uganda, Zambia) Regions within selected countries
Household
Farming Household
Aggregate data [agg]
Data collection for the first round of the Afrint project was made in 2002. The data collected as part of the second round are referred to as 2008 data, although in some cases collected in late 2007. From the outset the research team selected five case study countries: Ghana, Kenya, Malawi, Nigeria and Tanzania. Outside francophone Africa, these five countries were ideally suited, in the researchers' view, to charting progress in intensification, induced from below by farmers themselves, or state induced, as in the Asian Green Revolution. At the insistence of Sida, to the original five countries, four more were added: Ethiopia, Mozambique, Uganda and Zambia. Unlike the original five, the three last mentioned countries were deemed less constrained with respect to productive resources in agriculture. Ethiopia on the other hand is peculiar in an African context, with its long history of plough agriculture, and feudal-like social formation. In this project, the heterogeneous sample of countries has proved less cumbersome to work with than one might have expected.
Formally, the Afrint sample was drawn in four stages, of which the country selection described above was the first one. The next stage was regions within countries, followed by selection of villages within regions, and with selection of farm households as the last stage. All stages except the final one have been based on purposive sampling. Data collection was sought to be made at all four levels.The households sampled within these countries were selected with respect to the agricultural potential of the areas in which they reside.The intention was to capture the dynamism in the areas that are 'above average' in terms of ecological and market (infrastructure) endowments but excluding the most extreme cases in this regard.For logistical reasons we could not aim for a sample which is representative in a statistical sense. Instead we aimed at a sample which is illustrative of conditions in the maize-cassava belt, excluding both lowpotential dry and remote areas and extreme outliers at the other end of the scale.
Thus we used a four-stage sample design, with purposive sampling at all stages, except the last one, where households were sampled after having made up household lists. When we compare point estimates from the sample with those from other sources, for examples yields for the various crops with FAO statistics, no apparent sample bias has been detected.
In addition to household questionnaires we also used village questionnaires. Respondents to village interviews were key persons, like villageleaders and extension agents. Investigators were also instructed to conduct focus group interviews with representatives for various segments of the village population, including women farmers. When going for a second round and a panel in 2008, we went for a balanced panel design, i.e. constructing the 2008 sample so that in itself it would be representative of village populations in 2008. This also involved sampling descendants when a household had been partitioned since 2002. In case of sizeable in-migration to a village, we also provided for sampling from the newly arrived households. The 2002-2008 panel thus is a subset of the two cross sectional samples. In itself this subset is not statistically representative of the village population in any of the two years
20.6 Percent
Face-to-face [f2f]
Scope of Surey Round I (2001-2005)
Household demographic and socio-economic characteristics Farm and crop management Maize Cassava Cassava, marketing conditions Sorghum Rice Other food crops and vegetables Non-food cash crops Land resources Livestock Labour resources Institutional conditions Incomes and expenditures
Scope of survey II
Household Demographic and Socio-Economic Characteristics
Farm and Crop Management
Crops
Maize
Cassava
Sorghum
Rice
Rural - Urban and Rural - Rural Linkages (staple crops)
Other food crops and vegetables (for local markets)
Non-food cash crops (wholly or partly for export)
Agricultural Techniques
Land resources
Livestock & Fish
Livestock
No edting specification given
79.4 Percent
No sampling error estimates given
No other froms of appraisal given.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Subdatasets:
Long-term data: 2000-2021
5th percentile (p05) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
50th percentile (p50) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
95th percentile (p95) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021
General Description
The monthly aggregated Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) dataset is derived from 250m 8d GLASS V6 FAPAR. The data set is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance and LAI data using several other FAPAR products (MODIS Collection 6, GLASS FAPAR V5, and PROBA-V1 FAPAR) to generate a bidirectional long-short-term memory (Bi-LSTM) model to estimate FAPAR. The dataset time spans from March 2000 to December 2021 and provides data that covers the entire globe. The dataset can be used in many applications like land degradation modeling, land productivity mapping, and land potential mapping. The dataset includes:
Long-term:
Derived from monthly time-series. This dataset provides linear trend model for the p95 variable: (1) slope beta mean (p95.beta_m), p-value for beta (p95.beta_pv), intercept alpha mean (p95.alpha_m), p-value for alpha (p95.alpha_pv), and coefficient of determination R2 (p95.r2_m).
Monthly time-series:
Monthly aggregation with three standard statistics: (1) 5th percentile (p05), median (p50), and 95th percentile (p95). For each month, we aggregate all composites within that month plus one composite each before and after, ending up with 5 to 6 composites for a single month depending on the number of images within that month.
Data Details
Time period: March 2000 – December 2021
Type of data: Fraction of Absorbed Photosynthetically Active Radiation (FAPAR)
How the data was collected or derived: Derived from 250m 8 d GLASS V6 FAPAR using Python running in a local HPC. The time-series analysis were computed using the Scikit-map Python package.
Statistical methods used: for the long-term, Ordinary Least Square (OLS) of p95 monthly variable; for the monthly time-series, percentiles 05, 50, and 95.
Limitations or exclusions in the data: The dataset does not include data for Antarctica.
Coordinate reference system: EPSG:4326
Bounding box (Xmin, Ymin, Xmax, Ymax): (-180.00000, -62.0008094, 179.9999424, 87.37000)
Spatial resolution: 1/480 d.d. = 0.00208333 (250m)
Image size: 172,800 x 71,698
File format: Cloud Optimized Geotiff (COG) format.
Support
If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue: https://github.com/Open-Earth-Monitor/Global_FAPAR_250m/issues
Reference
Hackländer, J., Parente, L., Ho, Y.-F., Hengl, T., Simoes, R., Consoli, D., Şahin, M., Tian, X., Herold, M., Jung, M., Duveiller, G., Weynants, M., Wheeler, I., (2023?) "Land potential assessment and trend-analysis using 2000–2021 FAPAR monthly time-series at 250 m spatial resolution", submitted to PeerJ, preprint available at: https://doi.org/10.21203/rs.3.rs-3415685/v1
Name convention
To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are:
generic variable name: fapar = Fraction of Absorbed Photosynthetically Active Radiation
variable procedure combination: essd.lstm = Earth System Science Data with bidirectional long short-term memory (Bi–LSTM)
Position in the probability distribution / variable type: p05/p50/p95 = 5th/50th/95th percentile
Spatial support: 250m
Depth reference: s = surface
Time reference begin time: 20000301 = 2000-03-01
Time reference end time: 20211231 = 2022-12-31
Bounding box: go = global (without Antarctica)
EPSG code: epsg.4326 = EPSG:4326
Version code: v20230628 = 2023-06-28 (creation date)
Facebook
TwitterThere are a number of ways to test for the absence/presence of a spatial signal in a completely observed fine-resolution image. One of these is a powerful nonparametric procedure called enhanced false discovery rate (EFDR). A drawback of EFDR is that it requires the data to be defined on regular pixels in a rectangular spatial domain. Here, we develop an EFDR procedure for possibly incomplete data defined on irregular small areas. Motivated by statistical learning, we use conditional simulation (CS) to condition on the available data and simulate the full rectangular image at its finest resolution many times (M, say). EFDR is then applied to each of these simulations resulting in M estimates of the signal and M statistically dependent p-values. Averaging over these estimates yields a single, combined estimate of a possible signal, but inference is needed to determine whether there really is a signal present. We test the original null hypothesis of no signal by combining the M p-values into a single p-value using copulas and a composite likelihood. If the null hypothesis of no signal is rejected, we use the combined estimate. We call this new procedure EFDR-CS and, to demonstrate its effectiveness, we show results from a simulation study; an experiment where we introduce aggregation and incompleteness into temperature-change data in the Asia-Pacific; and an application to total-column carbon dioxide from satellite remote sensing data over a region of the Middle East, Afghanistan, and the western part of Pakistan. Supplementary materials for this article are available online.