The Shifts Dataset is a dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation.
The Mechanical MNIST – Distribution Shift dataset contains the results of finite element simulation of heterogeneous material subject to large deformation due to equibiaxial extension at a fixed boundary displacement of d = 7.0. The result provided in this dataset is the change in strain energy after this equibiaxial extension. The Mechanical MNIST dataset is generated by converting the MNIST bitmap images (28x28 pixels) with range 0 - 255 to 2D heterogeneous blocks of material (28x28 unit square) with varying modulus in range 1- s. The original bitmap images are sourced from the MNIST Digits dataset, (http://www.pymvpa.org/datadb/mnist.html) which corresponds to Mechanical MNIST – MNIST, and the EMNIST Letters dataset (https://www.nist.gov/itl/products-and-services/emnist-dataset) which correspond to Mechanical MNIST – EMNIST Letters. The Mechanical MNIST – Distribution Shift dataset is specifically designed to demonstrate three types of data distribution shift: (1) covariate shift, (2) mechanism shift, and (3) sampling bias, for all of which the training and testing environments are drawn from different distributions. For each type of data distribution shift, we have one dataset generated from the Mechanical MNIST bitmaps and one from the Mechanical MNIST – EMNIST Letters bitmaps. For the covariate shift dataset, the training dataset is collected from two environments (2500 samples from s = 100, and 2500 samples from s = 90), and the test data is collected from two additional environments (2000 samples from s = 75, and 2000 samples from s = 50). For the mechanism shift dataset, the training data is identical to the training data in the covariate shift dataset (i.e., 2500 samples from s = 100, and 2500 samples from s = 90), and the test datasets are from two additional environments (2000 samples from s = 25, and 2000 samples from s = 10). For the sampling bias dataset, datasets are collected such that each datapoint is selected from the broader MNIST and EMNIST inputs bitmap selection by a probability which is controlled by a parameter r. The training data is collected from two environments (9800 from r = 15, and 200 from r = -2), and the test data is collected from three different environments (2000 from r = -5, 2000 from r = -10, and 2000 from r = 1). Thus, in the end we have 6 benchmark datasets with multiple training and testing environments in each. The enclosed document “folder_description.pdf'” shows the organization of each zipped folder provided on this page. The code to reproduce these simulations is available on GitHub (https://github.com/elejeune11/Mechanical-MNIST/blob/master/generate_dataset/Equibiaxial_Extension_FEA_test_FEniCS.py).
This dataset provides calculated remote sensing reflectance (Rrs) from measurements collected with a Ramses TriOS radiometer deployed on the Santa Barbara Museum of Natural History Sea Center at Stearns Wharf, Santa Barbara, California, U.S. All measurements were taken over a fixed position at (34.41037665, -119.68557147). Three sensors are used to collect solar downwelling irradiance (Ed), sky radiance (Ls) and water-leaving radiance (Lw). These data have been processed to Rrs at 10 second intervals and are either concurrent or taken within 2.5 hours of SHIFT campaign flights. The data collected by the three Ramses TriOS sensors for eight days during the period 2022-04-05 to 2022-05-29 are also included. The data were translated from the proprietary format output by the Ramses TriOS instrument and saved in comma-separated values (CSV) format.
This dataset holds Level 1 (L1) brightness temperature data collected by the Hyperspectral Thermal Emission Spectrometer (HyTES) instrument. This imagery was acquired as part of the Surface Biology and Geology High-Frequency Time Series (SHIFT) campaign on March 23, 2022. The SHIFT campaign generated precise, high-frequency data on plant communities for nearly 1,656 square kilometers across Santa Barbara County, California, US, and the nearby ocean. HyTES is a compact image spectrometer that acquires data in 256 spectral bands between 7.5 and 12 micrometers; it was deployed on a Twin Otter aircraft. The SHIFT campaign sought to demonstrate the joint use of both VSWIR and thermal infrared (TIR) data. TIR data are used to measure land surface temperature (LST), which informs models of water flux from land surface through processes such as evapotranspiration. LST is sensitive to solar heat gains and local cooling effects due to evaporative cooling. The HyTES instrument measures TIR radiances that can be used to derive LST, emissivity and Level 3 products such as latent heat flux and detection of air pollution sources. The HyTES data are provided in HDF5 format and include 91 flight scenes. The data are not projected, but georeferencing information for each pixel are provided in the HDF5 and a separate ENVI file for each flight scene. In addition, the flight scene boundaries and an overlay image are provided in Keyhole Markup Language (KML) along with a quicklook image and spectral response data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benchmarking the robustness to distribution shifts traditionally relies on dataset collection which is typically laborious and expensive, in particular for datasets with a large number of classes like ImageNet. An exception to this procedure is ImageNet-C (Hendrycks & Dietterich, 2019), a dataset created by applying common real-world corruptions at different levels of intensity to the (clean) ImageNet images. Inspired by this work, we introduce ImageNet-Cartoon and ImageNet-Drawing, two datasets constructed by converting ImageNet images into cartoons and colored pencil drawings, using a GAN framework (Wang & Yu, 2020) and simple image processing (Lu et al., 2012), respectively.
This repository contains ImageNet-Cartoon and ImageNet-Drawing. Checkout the official GitHub Repo for the code on how to reproduce the datasets.
If you find this useful in your research, please consider citing:
@inproceedings{imagenetshift,
title={ImageNet-Cartoon and ImageNet-Drawing: two domain shift datasets for ImageNet},
author={Tiago Salvador and Adam M. Oberman},
booktitle={ICML Workshop on Shift happens: Crowdsourcing metrics and test datasets beyond ImageNet.},
year={2022}
}
This dataset contains vegetation plot locations, descriptions, fractional cover, and sample identifier information from surveys conducted as part of the 2022 NASA Surface Biology Geology (SBG) High Frequency Time series (SHIFT) campaign. Surveys took place from 2022-02-23 to 2022-09-27 at the Jack and Laura Dangermond Preserve, Sedgwick Reserve, and Carpinteria Salt Marsh Reserve, which are located in Santa Barbara County, California, USA. This project collected field data contemporaneously with weekly flights of the NASA Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) facility instrument over the study areas. Plot information includes: plot tree subform, species lists, plot description, plot samples characterization, and plot location and contextual information. Related data packages contain additional biogeochemical, reflectance, and foliar data. Survey data and metadata are presented in comma-separated values (.csv) format along with survey plot polygons in GeoJSON (.geojson) format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We consider forecasting the term structure of interest rates with the assumption that factors driving the yield curve are stationary around a slowly time-varying mean or shifting endpoint. The shifting endpoints are captured using either (i) time series methods (exponential smoothing) or (ii) long-range survey forecasts of either interest rates or inflation and output growth, or (iii) exponentially smoothed realizations of these macro variables. Allowing for shifting endpoints in yield curve factors provides substantial and significant gains in out-of-sample predictive accuracy, relative to stationary and random walk benchmarks. Forecast improvements are largest for long-maturity interest rates and for long-horizon forecasts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spreadsheet of values for chemical shift perturbations of SRSF1 methyl groups in the presence of U1 snRNP, ssRNA or U1 snRNA SL3
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data and programs replicate tables and figures from "Shift-Share Designs: Theory and Inference", by Adao, Kolesar, and Morales. Please see the Roadmap files for additional details.
This dataset holds full-resolution 3-band (true color) imagery acquired by NASA's Airborne Visible / Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) instrument. This imagery was collected as part of the Surface Biology and Geology High-Frequency Time Series (SHIFT) campaign which occurred during February to May, 2022, with a follow up activity for one week in September. The SHIFT campaign leveraged NASA's AVIRIS-NG facility instrument to collect VSWIR data at approximately a weekly cadence across a broad study area, enabling traceability analyses related to the science value of VSWIR revisits. AVIRIS-NG is a pushbroom spectral mapping system with high signal-to-noise ratio (SNR), designed and toleranced for high performance spectroscopy. AVIRIS-NG measures radiance at approximately 5-nm intervals in the Visible to Shortwave Infrared (VSWIR) spectral range from 380-2510 nm. The images in this dataset are true color (RGB) images from the wavelengths centered at approximately 808, 658, and 563 nm, subset from the full spectrum collected by AVIRIS-NG. The spatial resolution matches the native observed resolution (variable depending on the flightline, generally finer than 5 m and down to 2 m). There are two files for each flight line, one in PNG and one in georeferenced cloud-optimized GeoTIFF format; the GeoTIFF contains radiance floating point values while the PNG has been scaled and converted to integers.
This archive contains the part 1 of Shift Benchmark on Multiple Sclerosis lesion segmentation data. This dataset is provided by the Shifts Project to enable assessment of the robustness of models to distributional shift and the quality of their uncertainty estimates. This part is the MSSEG data collected in the digital repository of the OFSEP Cohort provided in the context of the MICCAI 2016 and 2021 challenges. A full description of the benchmark is available in https://arxiv.org/pdf/2206.15407. Part 2 of the data is available here. To find out more about the Shifts Project, please visit https://shifts.ai .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Understanding the performance of machine learning models across diverse data distributions is critically important for reliable applications. Motivated by this, there is a growing focus on curating benchmark datasets that capture distribution shifts. In this work, we present MetaShift—a collection of 12,868 sets of natural images across 410 classes—to address this challenge. We leverage the natural heterogeneity of Visual Genome and its annotations to construct MetaShift. The key construction idea is to cluster images using its metadata, which provides context for each image (e.g. cats with cars or cats in bathroom) that represent distinct data distributions. MetaShift has two important benefits: first, it contains orders of magnitude more natural data shifts than previously available. Second, it provides explicit explanations of what is unique about each of its data sets and a distance score that measures the amount of distribution shift between any two of its data sets. Importantly, to support evaluating ImageNet trained models on MetaShift, we match MetaShift with ImageNet hierarchy. The matched version covers 867 out of 1,000 classes in ImageNet-1k. Each class in the ImageNet-matched Metashift contains 2301.6 images on average, and 19.3 subsets capturing images in different contexts. We also propose a method to construct tasks on the matched version, giving an example to construct 19,024 binary classification tasks on it.
This dataset holds laboratory foliar chemical analyses results for field samples collected during the 2022 NASA Surface Biology Geology (SBG) High Frequency Time series (SHIFT) campaign in Santa Barbara County, California, USA. Leaf samples were collected from plots within the Dangermond Preserve, Sedgwick Reserve, and Carpinteria Salt Marsh Reserve during the period of 2022-02-23 to 2022-09-27 and dried for later analysis. This project collected field data contemporaneously with weekly flights of the NASA's Airborne Visible-Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) facility instrument over the study areas. Sixteen chemical traits from two different lab analyses are provided. (a) Elemental analysis: foliar nitrogen (%), phosphorus (%), magnesium (%), potassium (%), calcium (%), sulfur (%), boron (ppm), iron (ppm), manganese (ppm), copper (ppm), zinc (ppm), aluminum (ppm), and sodium (ppm). (b) AnkomFiber analysis: foliar hemicellulose and bound protein (%), cellulose (%), and lignin (%). Related data packages contain additional plot-level characterization, biogeochemical, reflectance, and foliar data. These data are provided in comma separated values (CSV) format.
The CoRE (Contractions or Range Expansions) database contains a library of published literature and data on species range shifts in response to climate change. Through a systematic review of publications returned from searches on Google Scholar, Web of Science, and Scopus, we selected primary research articles that documented or attempted to document species-level distribution shifts in animal or plant species in response to recent anthropogenic climate change. We extracted data in four broad categories: (i) basic study information (study duration, location, data quality and methodological factors); (ii) basic species information (scientific names and taxonomic groups); (iii) information on the observed range shifts (range dimension, occupancy or abundance shift, and range edge); and (iv) the description of the shift (range shift direction, magnitude of the shift, and whether it supported our hypotheses). We also took note of climate drivers mentioned and details on species vulnerability and adaptive capacity.
Accurately predicting species’ range shifts in response to environmental change is paramount for understanding ecological processes and global change. In synthetic analyses, traits emerge as significant but weak predictors of species’ range shifts across recent climate change. These studies assume linear responses to traits, while detailed empirical work often reveals trait responses that are unimodal and contain thresholds or other nonlinearities. We hypothesize that the use of linear modeling approaches fails to capture these nonlinearities and therefore may be under-powering traits to predict range shifts. We evaluate the predictive performance of approaches that can capture nonlinear relationships (ridge-regularized linear regression, support vector regression with linear and nonlinear kernels, and random forests). We apply our models using six multi-decadal range shift datasets for plants, moths, marine fish, birds, and small mammals. We show that nonlinear approaches can perform b..., We assess model performance using six datasets encompassing a broad taxonomic range. The number of species per dataset ranges from 28 to 239 (mean=118, median=94), and range shifts were observed over periods ranging from 20 to 100+ years. Each dataset was derived from previous evaluations of traits as range shift predictors and consists of a list of focal species, associated species-level traits, and a range shift metric., , # Accounting for nonlinear responses to traits improves range shift predictions
https://doi.org/10.5061/dryad.wstqjq2v8
We assess the performance of nonlinear models to predict climate-induced range shifts using six datasets encompassing a broad taxonomic range. The number of species per dataset ranges from 28 to 239 (mean=118, median=94), and range shifts were observed over periods ranging from 20 to 100+ years. Each dataset was derived from previous evaluations of traits as range shift predictors and consists of a list of focal species, associated species-level traits, and a range shift metric.
See the DataDescriptions_CannistraBuckley.pdf file for information on the data and structure. Refer to the references below for additional information on the datasets and please cite those papers if you use this data.
Data was derived from the following sources:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data represents a shift in process variable presented in a control chart due to the sudden change in the machine setting.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
This Dryad dataset includes the input data analyzed in the main text and the Supplemental Information described in the main text. The main text describes how the data were collected in detail. These data were derived from the Dryad dataset: https://doi.org/10.5061/dryad.hx3ffbgb1. The zip file containing the data also contains a README.md file that overviews its contents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set accompanying the paper: "Effect of alcohol on the speed of shifting endogenous and exogenous attention". Matlab files and .mat data set reproduce the figures from the data. Raw data files have records of attention set shifting times as ASCII text files. Associated matlab files read raw data to calculate attention shift times for each subject and condition.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Globally, many species’ distributions are shifting in response to contemporary climate change. However, the direction and rate of shifts remain difficult to predict, impeding managers' abilities to allocate resources most effectively. Here, we explore a new approach for forecasting species' range-limit shifts that requires only abundance data along environmental (eg elevational) gradients. We hypothesized that species’ abundance distributions could provide information on the likelihood of future range-limit shifts. We tested this prediction using data from several transect studies that compared historical and contemporary distributions. Consistent with our prediction, we found that strong asymmetry in abundance distributions (ie “leaning” distributions) indeed preceded species’ lower-limit range shifts (Fisher's exact test P < 0.001, R2 = 0.28). Accordingly, surveying abundances along environmental gradients may be one promising, cost-effective method for forecasting local shifts. Ideally, practitioners will be able to incorporate this approach into species-specific management planning and to inform on-the-ground conservation efforts. Methods Data were extracted from eight cited peer-reviewed studies in total. These studies cover a range of species, biomes, and regions across the globe. The accompanying data here have been centralized and processed to include the key metrics we discuss in our manuscript such as the historical and current range limits, species' midpoints, optimum elevations, leans in meters, leans as percentages, and lower-range limits shift rates per year and decade.
The Shifts Dataset is a dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation.