Sea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.
This part of the data release contains a grid of standard deviations of bathymetric soundings within each 0.5 m x 0.5 m grid cell. The bathymetry was collected on February 1, 2011, in the Sacramento River from the confluence of the Feather River to Knights Landing. The standard deviations represent one component of bathymetric uncertainty in the final digital elevation model (DEM), which is also available in this data release. The bathymetry data were collected by the USGS Pacific Coastal and Marine Science Center (PCMSC) team with collaboration and funding from the U.S. Army Corps of Engineers. This project used interferometric sidescan sonar to characterize the riverbed and channel banks along a 12 mile reach of the Sacramento River near the town of Knights Landing, California (River Mile 79 through River Mile 91) to aid in the understanding of fish response to the creation of safe habitat associated with levee restoration efforts in two 1.5 mile reaches of the Sacramento River between River Mile 80 and 86.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This data is part of the Monthly aggregated Water Vapor MODIS MCD19A2 (1 km) dataset. Check the related identifiers section on the Zenodo side panel to access other parts of the dataset. General Description The monthly aggregated water vapor dataset is derived from MCD19A2 v061. The Water Vapor data measures the column above ground retrieved from MODIS near-IR bands at 0.94μm. The dataset time spans from 2000 to 2022 and provides data that covers the entire globe. The dataset can be used in many applications like water cycle modeling, vegetation mapping, and soil mapping. This dataset includes:
Monthly time-series:Derived from MCD19A2 v061, this data provides a monthly aggregated mean and standard deviation of daily water vapor time-series data from 2000 to 2022. Only positive non-cloudy pixels were considered valid observations to derive the mean and the standard deviation. The remaining no-data values were filled using the TMWM algorithm. This dataset also includes smoothed mean and standard deviation values using the Whittaker method. The quality assessment layers and the number of valid observations for each month can provide an indication of the reliability of the monthly mean and standard deviation values. Yearly time-series:Derived from monthly time-series, this data provides a yearly time-series aggregated statistics of the monthly time-series data. Long-term data (2000-2022):Derived from monthly time-series, this data provides long-term aggregated statistics for the whole series of monthly observations. Data Details
Time period: 2000–2022 Type of data: Water vapor column above the ground (0.001cm) How the data was collected or derived: Derived from MCD19A2 v061 using Google Earth Engine. Cloudy pixels were removed and only positive values of water vapor were considered to compute the statistics. The time-series gap-filling and time-series smoothing were computed using the Scikit-map Python package. Statistical methods used: Four statistics were derived: standard deviation, percentiles 25, 50, and 75. Limitations or exclusions in the data: The dataset does not include data for Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180.00000, -62.00081, 179.99994, 87.37000) Spatial resolution: 1/120 d.d. = 0.008333333 (1km) Image size: 43,200 x 17,924 File format: Cloud Optimized Geotiff (COG) format. Support If you discover a bug, artifact, or inconsistency, or if you have a question please use some of the following channels:
Technical issues and questions about the code: GitLab Issues General questions and comments: LandGIS Forum Name convention To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are:
generic variable name: wv = Water vapor variable procedure combination: mcd19a2v061.seasconv = MCD19A2 v061 with gap-filling algorithm Position in the probability distribution / variable type: m = mean | sd = standard deviation | n = number of observations | qa = quality assessment Spatial support: 1km Depth reference: s = surface Time reference begin time: 20000101 = 2000-01-01 Time reference end time: 20221231 = 2022-12-31 Bounding box: go = global (without Antarctica) EPSG code: epsg.4326 = EPSG:4326 Version code: v20230619 = 2023-06-19 (creation date)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in United States, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/united-states-median-household-income-by-household-size.jpeg" alt="United States median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Walmart Dataset (Retail)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rutuspatel/walmart-dataset-retail on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Dataset Description :
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
Store - the store number
Date - the week of sales
Weekly_Sales - sales for the given store
Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
Temperature - Temperature on the day of sale
Fuel_Price - Cost of fuel in the region
CPI – Prevailing consumer price index
Unemployment - Prevailing unemployment rate
Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Analysis Tasks
Basic Statistics tasks
1) Which store has maximum sales
2) Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
3) Which store/s has good quarterly growth rate in Q3’2012
4) Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
5) Provide a monthly and semester view of sales in units and give insights
Statistical Model
For Store 1 – Build prediction models to forecast demand
Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
Change dates into days by creating new variable.
Select the model which gives best accuracy.
--- Original source retains full ownership of the source dataset ---
Chlorophyll-a, is a widely used proxy for phytoplankton biomass and an indicator for changes in phytoplankton production. As an essential source of energy in the marine environment, the extent and availability of phytoplankton biomass can be highly influential for fisheries production and dictate trophic structure in marine ecosystems. Changes in phytoplankton biomass are predominantly effected by changes in nutrient availability, through either natural (e.g., turbulent ocean mixing) or anthropogenic (e.g., agricultural runoff) processes. This layer represents the standard deviation of the 8-day time series of chlorophyll-a (mg/m3) from 1998-2018. Data products generated by the Ocean Colour component of the European Space Agency (ESA) Climate Change Initiative (CCI) project. These files are 8-day 4-km composites of merged sensor products: Global Area Coverage (GAC), Local Area Coverage (LAC), MEdium Resolution Imaging Spectrometer (MERIS), Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua, Ocean and Land Colour Instrument (OLCI), Sea-viewing Wide Field-of-view Sensor (SeaWiFS), and Visible Infrared Imaging Radiometer Suite (VIIRS). The standard deviation was calculated over all 8-day chlorophyll-a data from 1998-2018 for each pixel. A quality control mask was applied to remove spurious data associated with shallow water, following Gove et al., 2013. Nearshore map pixels with no data were filled with values from the nearest neighboring valid offshore pixel by using a grid of points and the Near Analysis tool in ArcGIS then converting points to raster. Data source: https://oceanwatch.pifsc.noaa.gov/erddap/griddap/esa-cci-chla-8d-v5-0.graph
Dataset Description :
This is the historical data that covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields:
Store - the store number
Date - the week of sales
Weekly_Sales - sales for the given store
Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week
Temperature - Temperature on the day of sale
Fuel_Price - Cost of fuel in the region
CPI – Prevailing consumer price index
Unemployment - Prevailing unemployment rate
Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13
Analysis Tasks
Basic Statistics tasks
1) Which store has maximum sales
2) Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation
3) Which store/s has good quarterly growth rate in Q3’2012
4) Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together
5) Provide a monthly and semester view of sales in units and give insights
Statistical Model
For Store 1 – Build prediction models to forecast demand
Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.
Change dates into days by creating new variable.
Select the model which gives best accuracy.
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.
PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.
Dataset Structure
containing the images. The file dataset.json comprehends a list of json objects with the attributes:
user: anonymized user that made the post;
filename: image file name;
raw_caption: raw caption;
caption: clean caption;
date: post date.
Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.
Download Instructions
If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:
cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz
Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:
python download_dataset.py --access_token=
1.Colour patterns are used by many species to make decisions that ultimately affect their Darwinian fitness. Colour patterns consist of a mosaic of patches that differ in geometry and visual properties. Although traditionally pattern geometry and colour patch visual properties are analysed separately, these components are likely to work together as a functional unit. Despite this, the combined effect of patch visual properties, patch geometry, and the effects of the patch boundaries on animal visual systems, behaviour and fitness are relatively unexplored.
2.Here we describe Boundary Strength Analysis (BSA), a novel way to combine the geometry of the edges (boundaries among the patch classes) with the receptor noise estimate (ΔS) of the intensity of the edges. The method is based upon known properties of vertebrate and invertebrate retinas. The mean and SD of ΔS (mΔS, sΔS) of a colour pattern can be obtained by weighting each edge class ΔS by its length, separately for chromatic and ac...
Upvote! The database contains +40,000 records on US Gross Rent & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 325,272 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to upvote. Upvote right now, please. Enjoy!
Get the full free database with coupon code: FreeDatabase, See directions at the bottom of the description... And make sure to upvote :) coupon ends at 2:00 pm 8-23-2017
The data set originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.
Only proper citing is required please see the documentation for details. Have Fun!!!
Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.
For any questions, you may reach us at research_development@goldenoakresearch.com. For immediate assistance, you may reach me on at 585-626-2965
please note: it is my personal number and email is preferred
Check our data's accuracy: Census Fact Checker
Don't settle. Go big and win big. Optimize your potential**. Access all gross rent records and more on a scale roughly equivalent to a neighborhood, see link below:
A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.
Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information
Overview: Actual Natural Vegetation (ANV): probability of occurrence for the Common hazel in its realized environment for the period 2000 - 2022 Traceability (lineage): This is an original dataset produced with a machine learning framework which used a combination of point datasets and raster datasets as inputs. Point dataset is a harmonized collection of tree occurrence data, comprising observations from National Forest Inventories (EU-Forest), GBIF and LUCAS. The complete dataset is available on Zenodo. Raster datasets used as input are: harmonized and gapfilled time series of seasonal aggregates of the Landsat GLAD ARD dataset (bands and spectral indices); monthly time series air and surface temperature and precipitation from a reprocessed version of the Copernicus ERA5 dataset; long term averages of bioclimatic variables from CHELSA, tree species distribution maps from the European Atlas of Forest Tree Species; elevation, slope and other elevation-derived metrics; long term monthly averages snow probability and long term monthly averages of cloud fraction from MODIS. For a more comprehensive list refer to Bonannella et al. (2022) (in review, preprint available at: https://doi.org/10.21203/rs.3.rs-1252972/v1). Scientific methodology: Probability and uncertainty maps were the output of a spatiotemporal ensemble machine learning framework based on stacked regularization. Three base models (random forest, gradient boosted trees and generalized linear models) were first trained on the input dataset and their predictions were used to train an additional model (logistic regression) which provided the final predictions. More details on the whole workflow are available in the listed publication. Usability: Probability maps can be used to detect potential forest degradation and compositional change across the time period analyzed. Some possible applications for these topics are explained in the listed publication. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: Distribution maps were validated using a spatial 5-fold cross validation following the workflow detailed in the listed publication. Completeness: The raster files perfectly cover the entire Geo-harmonizer region as defined by the landmask raster dataset available here. Consistency: Areas which are outside of the calibration area of the point dataset (Iceland, Norway) usually have high uncertainty values. This is not only a problem of extrapolation but also of poor representation in the feature space available to the model of the conditions that are present in this countries. Positional accuracy: The rasters have a spatial resolution of 30m. Temporal accuracy: The maps cover the period 2000 - 2020, each map covers a certain number of years according to the following scheme: (1) 2000--2002, (2) 2002--2006, (3) 2006--2010, (4) 2010--2014, (5) 2014--2018 and (6) 2018--2020 Thematic accuracy: Both probability and uncertainty maps contain values from 0 to 100: in the case of probability maps, they indicate the probability of occurrence of a single individual of the target species, while uncertainty maps indicate the standard deviation of the ensemble model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary data on my PhD thesis "Learning Reduced Models for Large-Scale Agent-Based Systems". Chapters 1-3, 7 and A do not have supplementary data.
Chapter 4
Large_deviation_example.zip contains the trajectory for Figure 4.8.
mean_exit_time* contains the raw data to compute the mean exit time and standard deviation for the ABM process (JP) and SDE process (CLE). It contains additionally a precomputed mean and standard deviation as well as the corresponding numbers of agents.
transition_matrix* contain the computed box discretizations as MATLAB and Numpy files as used for Figures 4.2-4.4, 4.6 and Tables 4.1 and 4.2.
Chapter 5
CVM_2021-07-09-15-53_training_data.npz contains the training data for Figure 5.7 a and b.
CVM_2021-09-29-07-13_distribution.npz contains the raw data for Figure 5.7 c.
The remaining data for Chapter 5 can be found in the related dataset doi.org/10.5281/zenodo.4522119.
Chapter 6
CVM_pareto_estimate contains trajectory data required for Figure 6.6 b to estimate points in the Pareto Front using the civil violence model.
CVM_training_data contains the training data to construct the surrogate model. Each data set consists of CVM_*_cops_train.npz as training set, CVM_*_cops_trajectory.npz as sample trajectory and CVM_*_cops.pkl to compute the training data.
CVM_covering_iterations_8.mat Pareto set covering after 8 iterations for the civil violence model. Required for Figure 6.6 a.
CVM_pareto_set+front.npz is required for Figure 6.6 b.
CVM_surrogate_model.mat contains the surrogate model for the civil violence model
Expl_iterations_* contains Pareto set coverings after 8 and 12 iterations for Example 6.1.4 and Figure 6.1.
VM_covering_iterations_12.mat contains the Pareto set covering depicted in Figure 6.4 a.
VM_ODE_covering_iterations_12_subset_front.mat contains the Pareto set covering depicted in Figure 6.5 and 6.5 c.
VM_ODE_covering_iterations_12_subset.mat contains the Pareto set covering depicted in Figure 6.5 and 6.5 d.
VM_ODE_covering_iterations_12.mat contains the Pareto set covering depicted in Figure 6.4 b.
VM_surrogate_model.mat contains the surrogate model for the extended voter model.
VM_test_points_non_pareto.npz contains Non-Pareto points in Figure 6.5 and 6.5 d.
VM_test_points_pareto.npz contains Pareto points in Figure 6.5 and 6.5 c.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated from raw data obtained at
Data was processed with R package EpiEstim (methodology in the associated preprint). Briefly, instantaneous R was estimated within a 5 day time window. Prior mean and standard deviation values for R were set at 3 and 1. Serial interval was estimated using a parametric distribution with uncertainty (offset gamma). We compared the results at two time points (day 7 and day 21 after the first case was registered at each region) from different brazillian states in order to make inferences about the epidemic dynamics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is part of the Global Ensemble Digital Terrain Model (GEDTM30) dataset. Check the related identifiers section below to access other parts of the dataset.
This is the first release of the Multiscale Land Surface Parameters (LSPs) of Global Ensemble Digital Terrain Model (GEDTM30). Use for testing purposes only. This work was funded by the European Union. However, the views and opinions expressed are solely those of the author(s) and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. The data is provided "as is." The Open-Earth-Monitor project consortium, along with its suppliers and licensors, hereby disclaims all warranties of any kind, express or implied, including, without limitation, warranties of merchantability, fitness for a particular purpose, and non-infringement. Neither the Open-Earth-Monitor project consortium nor its suppliers and licensors make any warranty that the website will be error-free or that access to it will be continuous or uninterrupted. You understand that you download or otherwise obtain content or services from the website at your own discretion and risk.
LSPs are derivative products of the GEDTM30 that represent measures of local topographic position, curvature, hydrology, light, and shadow. A pyramid representation is implemented to generate multiscale resolutions of 30m, 60m, 120m, 240m, 480m, and 960m for each LSP. The parametrization is powered by Whitebox Workflows in Python. To see the documentation, please visit our GEDTM30 GitHub (https://github.com/openlandmap/GEDTM30).
This dataset includes:
Due to Zenodo's storage limitations, the high resolution LSP data are provided via external links:
Layer | Scale | Data Type | No Data |
---|---|---|---|
Difference from Mean Elevation | 100 | Int16 | 32,767 |
Geomorphons | 1 | Byte | 255 |
Hillshade | 1 | UInt16 | 65,535 |
LS Factor | 1,000 | UInt16 | 65,535 |
Maximal Curvature | 1,000 | Int16 | 32,767 |
Minimal Curvature | 1,000 | Int16 | 32,767 |
Negative Openness | 100 | UInt16 | 65,535 |
Positive Openness | 100 | UInt16 | 65,535 |
Profile Curvature | 1,000 | Int16 | 32,767 |
Ring Curvature | 10,000 | Int16 | 32,767 |
Shape Index | 1,000 | Int16 | 32,767 |
Slope in Degree | 100 | UInt16 | 65,535 |
Specific Catchment Area | 1,000 | UInt16 | 65,535 |
Spherical Standard Deviation of the Normals | 100 | Int16 | 32,767 |
Tangential Curvature | 1,000 | Int16 | 32,767 |
Topographic Wetness Index | 100 | Int16 | 32,767 |
If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here
To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files.
For example, for twi_edtm_m_120m_s_20000101_20221231_go_epsg.4326_v20241230.tif, the fields are:
The data sets contains the major results of the article “Improving information extraction from model data using sensitivity-weighted performance criteria“ written by Guse et al. (2020). In this article, it is analysed how a sensitivity-weighted performance criterion improves parameter identifiability and model performance. More details are given the in article. The files of this dataset are described as follows. Parameter sampling: FAST parameter sampling.xlsx: To estimate the sensitivity, the Fourier Amplitude Sensitivity Test (FAST) was used (R-routine FAST, Reusser, 2013). Each column shows the values of the model parameter of the SWAT model (Arnold et al., 1998). All parameters are explained in detail in Neitsch et al. (2011). The FAST parameter sampling defines the number of model runs. For twelve model parameters as in this case, 579 model runs are required. The same parameter sets were used for all catchments. Daily sensitivity time series: Sensitivity_2000_2005.xlsx: Daily time series of parameter sensitivity for the period 2000-2005 for three catchments in Germany (Treene, Saale, Kinzig). Each column shows the sensitivity of one parameter of the SWAT model. The methodological approach of the temporal dynamics of parameter sensitivity (TEDPAS) was developed by Reusser et al. (2011) and firstly applied to the SWAT model in Guse et al. (2014). As sensitivity index, the first-order partial variance is used that is the ratio of the partial variance of one parameter divided by the total variance. The sensitivity is thus always between 0 and 1. The sum in one row, i.e. the sensitivity of all model parameters on one day, could not be higher than 1. Parameter sampling: LH parameter sampling.xlsx: To calculate parameter identifiability, Latin Hypercube sampling was used to generate 2000 parameter sets (R-package FME, Soetaert and Petzoldt, 2010). Each column shows the values of the model parameter of the SWAT model (Arnold et al., 1998). All parameters are explained in detail in Neitsch et al. (2011). The same parameter sets were used for all catchments. Performance criteria with and without sensitivity weights: RSR_RSRw_cal.xlsx: • Calculation of the RSR once and RSRw separately for each model parameter. • RSR: Typical RSR (RMSE divided by standard deviation) • RSR_w: RSR with weights according to daily sensitivity time series. The calculation was carried out in all three catchments. • The column RSR shows the results of the RSR (RMSE divided by standard deviation) for the different model runs. • The column RSR[_parameter name] shows the calculation of the RSR_w for the specific model parameter. • RSR_w give weights on each day based on the daily parameter sensitivity (as shown in sensitivity_2000_2005.xlsx). This means that days with a higher parameter sensitivity are higher weighted. In the methodological approach the best 25% of the model runs were calculated (best 500 model runs) and the model parameters were constrained to the most appropriate parameter values (see methodological description in the article). Performance criteria for the three catchments: GOFrun_[catchment name]_RSR.xlsx: These three tables are organised identical and are available for the three catchments in Germany (Treene, Saale, Kinzig). In using the different parameter ranges for the catchments as defined in the previous steps, 2000 model simulation were carried out. Therefore, a Latin-Hypercube sampling was used (R-package FME, Soetaert and Petzoldt, 2010). The three tables show the results of 2000 model simulations for ten different performance criteria for the two different methodological approaches (RSR and swRSR) and two periods (calibration: 2000-2005 and validation: 2006-2010). Performance criteria for the three catchments: GOFrun_[catchment name]_MAE.xlsx: The three tables show the results of 2000 model simulations for ten different performance criteria for the two different methodological approaches (MAE and swMAE) and two periods (calibration: 2000-2005 and validation: 2006-2010).
We constrain the densities of Earth- to Neptune-size planets around very cool (T_e_=3660-4660K) Kepler stars by comparing 1202 Keck/HIRES radial velocity measurements of 150 nearby stars to a model based on Kepler candidate planet radii and a power-law mass-radius relation. Our analysis is based on the presumption that the planet populations around the two sets of stars are the same. The model can reproduce the observed distribution of radial velocity variation over a range of parameter values, but, for the expected level of Doppler systematic error, the highest Kolmogorov-Smirnov probabilities occur for a power-law index {alpha}{approx}4, indicating that rocky-metal planets dominate the planet population in this size range. A single population of gas-rich, low-density planets with {alpha}=2 is ruled out unless our Doppler errors are >=5m/s, i.e., much larger than expected based on observations and stellar chromospheric emission. If small planets are a mix of {gamma} rocky planets ({alpha}=3.85) and 1-{gamma} gas-rich planets ({alpha}=2), then {gamma}>0.5 unless Doppler errors are >=4m/s. Our comparison also suggests that Kepler's detection efficiency relative to ideal calculations is less than unity. One possible source of incompleteness is target stars that are misclassified subgiants or giants, for which the transits of small planets would be impossible to detect. Our results are robust to systematic effects, and plausible errors in the estimated radii of Kepler stars have only moderate impact.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
NetCDF file of the SREF standard deviation of wind speed and direction that was used to inject variability in the FDDA input. variable U_NDG_OLD contains standard deviation of wind speed (m/s) variable V_NDG_OLD contains the standard deviation of wind direction (deg). This dataset is not publicly accessible because: This is a netcdf file that is 3.9Gb. It can be accessed through the following means: On the HPC system sol (2016). In the asm archive here: /asm/grc/JGR_ENSEMBLE_ScienceHub/figure1.nc. Format: Figure 1 data. This is the variability of wind speed and direction of the four dimensional data assimilation inputs. The variability includes the 14 members of the ensemble. This dataset is associated with the following publication: Gilliam , R., C. Hogrefe , J. Godowitch, S. Napelenok , R. Mathur , and S.T. Rao. Impact of inherent meteorology uncertainty on air quality model predictions. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES. American Geophysical Union, Washington, DC, USA, 120(23): 12,259–12,280, (2015).
Sea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.