Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameter Settings of Synthetic Data Generation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Artifacts for the paper titled Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?.
This artifact repository contains 9 compressed folders, as follows:
ID File Name Description
1 syn_circa.zip CIRCA10, and CIRCA50 datasets for Causal Discovery
2 syn_rcd.zip RCD10, and RCD50 datasets for Causal Discovery
3 syn_causil.zip CausIL10, and CausIL50 datasets for Causal Discovery
4 rca_circa.zip CIRCA10, and CIRCA50 datasets for RCA
5 rca_rcd.zip RCD10, and RCD50 datasets for RCA
6 online-boutique.zip Online Boutique dataset for RCA
7 sock-shop-1.zip Sock Shop 1 dataset for RCA
8 sock-shop-2.zip Sock Shop 2 dataset for RCA
9 train-ticket.zip Train Ticket dataset for RCA
Each zip file contains the generated/collected data from the corresponding data generator or microservice benchmark systems (e.g., online-boutique.zip contains metrics data collected from the Online Boutique system).
Details about the generation of our datasets
We use three different synthetic data generators from three previous RCA studies [15, 25, 28] to create the synthetic datasets: CIRCA, RCD, and CausIL data generators. Their mechanisms are as follows:1. CIRCA datagenerator [28] generates a random causal directed acyclic graph (DAG) based on a given number of nodes and edges. From this DAG, time series data for each node is generated using a vector auto-regression (VAR) model. A fault is injected into a node by altering the noise term in the VAR model for two timestamps. 2. RCD data generator [25] uses the pyAgrum package [3] to generate a random DAG based on a given number of nodes, subsequently generating discrete time series data for each node, with values ranging from 0 to 5. A fault is introduced into a node by changing its conditional probability distribution.3. CausIL data generator [15] generates causal graphs and time series data that simulate the behavior of microservice systems. It first constructs a DAG of services and metrics based on domain knowledge, then generates metric data for each node of the DAG using regressors trained on real metrics data. Unlike the CIRCA and RCD data generators, the CausIL data generator does not have the capability to inject faults.To create our synthetic datasets, we first generate 10 DAGs whose nodes range from 10 to 50 for each of the synthetic data generators. Next, we generate fault-free datasets using these DAGs with different seedings, resulting in 100 cases for the CIRCA and RCD generators and 10 cases for the CausIL generator. We then create faulty datasets by introducing ten faults into each DAG and generating the corresponding faulty data, yielding 100 cases for the CIRCA and RCD data generators. The fault-free datasets (e.g. syn_rcd
, syn_circa
) are used to evaluate causal discovery methods, while the faulty datasets (e.g. rca_rcd
, rca_circa
) are used to assess RCA methods.
We deploy three popular benchmark microservice systems: Sock Shop [6], Online Boutique [4], and Train Ticket [8], on a four-node Kubernetes cluster hosted by AWS. Next, we use the Istio service mesh [2] with Prometheus [5] and cAdvisor [1] to monitor and collect resource-level and service-level metrics of all services, as in previous works [ 25 , 39, 59 ]. To generate traffic, we use the load generators provided by these systems and customise them to explore all services with 100 to 200 users concurrently. We then introduce five common faults (CPU hog, memory leak, disk IO stress, network delay, and packet loss) into five different services within each system. Finally, we collect metrics data before and after the fault injection operation. An overview of our setup is presented in the Figure below.
Code
The code to reproduce the experimental results in the paper is available at https://github.com/phamquiluan/RCAEval.
References
As in our paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Developing data-driven solutions that address real-world problems requires understanding of these problems’ causes and how their interaction affects the outcome–often with only observational data. Causal Bayesian Networks (BN) have been proposed as a powerful method for discovering and representing the causal relationships from observational data as a Directed Acyclic Graph (DAG). BNs could be especially useful for research in global health in Lower and Middle Income Countries, where there is an increasing abundance of observational data that could be harnessed for policy making, program evaluation, and intervention design. However, BNs have not been widely adopted by global health professionals, and in real-world applications, confidence in the results of BNs generally remains inadequate. This is partially due to the inability to validate against some ground truth, as the true DAG is not available. This is especially problematic if a learned DAG conflicts with pre-existing domain doctrine. Here we conceptualize and demonstrate an idea of a “Causal Datasheet” that could approximate and document BN performance expectations for a given dataset, aiming to provide confidence and sample size requirements to practitioners. To generate results for such a Causal Datasheet, a tool was developed which can generate synthetic Bayesian networks and their associated synthetic datasets to mimic real-world datasets. The results given by well-known structure learning algorithms and a novel implementation of the OrderMCMC method using the Quotient Normalized Maximum Likelihood score were recorded. These results were used to populate the Causal Datasheet, and recommendations could be made dependent on whether expected performance met user-defined thresholds. We present our experience in the creation of Causal Datasheets to aid analysis decisions at different stages of the research process. First, one was deployed to help determine the appropriate sample size of a planned study of sexual and reproductive health in Madhya Pradesh, India. Second, a datasheet was created to estimate the performance of an existing maternal health survey we conducted in Uttar Pradesh, India. Third, we validated generated performance estimates and investigated current limitations on the well-known ALARM dataset. Our experience demonstrates the utility of the Causal Datasheet, which can help global health practitioners gain more confidence when applying BNs.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
CausalDynamics: A large-scale benchmark for structural discovery of dynamical causal models
A comprehensive benchmark framework designed to rigorously evaluate state-of-the-art causal discovery algorithms for dynamical systems.
Key Features
1️⃣ Large-Scale Benchmark. Systematically evaluate state-of-the-art causal discovery algorithms on thousands of graph challenges with increasing difficulty. 2️⃣ Customizable Data Generation. Scalable, user-friendly… See the full description on the dataset page: https://huggingface.co/datasets/kausable/CausalDynamics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload contains the Multimodal3DIdent dataset introduced in the paper Identifiability Results for Multimodal Contrastive Learning presented at ICLR 2023. The dataset provides an identifiability benchmark with image/text pairs generated from controllable ground truth factors, some of which are shared between image and text modalities. The training, validation, and test sets contain 125000, 10000, and 10000 image/text pairs and ground truth factors, respectively. The code for the data generation is publicly available: https://github.com/imantdaunhawer/Multimodal3DIdent.
Description
------------------
The generated dataset contains image and text data as well as the ground truth factors of variation for each modality. Each split (train/val/test) of the dataset is structured as follows:
.
├── images
│ ├── 000000.png
│ ├── 000001.png
│ └── etc.
├── text
│ └── text_raw.txt
├── latents_image.csv
└── latents_text.csv
The directories images
and text
contain the generated image and text data, whereas the CSV files latents_image.csv
and latents_text.csv
contain the values of the respective latent factors. There is an index-wise correspondence between images, sentences, and latent factors. For example, the first line in the file text_raw.txt
is the sentence that corresponds to the first image in the images
directory.
Latent factors: We use the following ground truth latent factors to generate image and text data. Each factor is sampled from a uniform distribution defined on the specified set of values for the respective factor.
Modality | Latent Factor | Values | Details |
---|---|---|---|
Image | Object shape | {0, 1, ..., 6} | Mapped to Blender shapes like "Teapot", "Hare", etc. |
Image | Object x-position | {0, 1, 2} | Mapped to {-3, 0, 3} for Blender |
Image | Object y-position | {0, 1, 2} | Mapped to {-3, 0, 3} for Blender |
Image | Object z-position | {0} | Constant |
Image | Object alpha-rotation | [0, 1]-interval | Linearly transformed to [-pi/2, pi/2] for Blender |
Image | Object beta-rotation | [0, 1]-interval | Linearly transformed to [-pi/2, pi/2] for Blender |
Image | Object gamma-rotation | [0, 1]-interval | Linearly transformed to [-pi/2, pi/2] for Blender |
Image | Object color | [0, 1]-interval | Hue value in HSV transformed to RGB for Blender |
Image | Spotlight position | [0, 1]-interval | Transformed to a unique position on a semicircle |
Image | Spotlight color | [0, 1]-interval | Hue value in HSV transformed to RGB for Blender |
Image | Background color | [0, 1]-interval | Hue value in HSV transformed to RGB for Blender |
Text | Object shape | {0, 1, ..., 6} | Mapped to strings like "teapot", "hare", etc. |
Text | Object x-position | {0, 1, 2} | Mapped to strings "left", "center", "right" |
Text | Object y-position | {0, 1, 2} | Mapped to strings "top", "mid", "bottom" |
Text | Object color | string values | Color names from 3 different color palettes |
Text | Text phrasing | {0, 1, ..., 4} | Mapped to 5 different English sentences |
Image rendering: We use the Blender rendering engine to create visually complex images depicting a 3D scene. Each image in the dataset shows a colored 3D object of a certain shape or class (i.e., teapot, hare, cow, armadillo, dragon, horse, or head) in front of a colored background and illuminated by a colored spotlight that is focused on the object and located on a semicircle above the scene. The resulting RGB images are of size 224 x 224 x 3.
Text generation: We generate a short sentence describing the respective scene. Each sentence describes the object's shape or class (e.g., teapot), position (e.g., bottom-left), and color. The color is represented in a human-readable form (e.g., "lawngreen", "xkcd:bright aqua", etc.) as the name of the color (from a randomly sampled palette) that is closest to the sampled color value in RGB space. The sentence is constructed from one of five pre-configured phrases with placeholders for the respective ground truth factors.
Relation between modalities: Three latent factors (object shape, x-position, y-position) are shared between image/text pairs. The object color also exhibits a dependence between modalities; however, it is not a 1-to-1 correspondence because the color palette is sampled randomly from a set of multiple palettes. Additionally, there is a causal dependence of object color on object x-position since the range of hue values [0, 1] is split into three equally sized intervals, each of which is associated with a fixed x-position of the object. For instance, if x-position is “left”, we sample the hue value from the interval [0, 1/3]. Consequently, the color of the object can be predicted to some degree from the object's position.
Acknowledgements
-------------------------------
The Multimodal3DIdent dataset builds on the following resources:
- 3DIdent dataset
- Causal3DIdent dataset
- CLEVR dataset
- Blender open-source 3D creation suite
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Causal AI Market size was valued at USD 11.77 Million in 2024 and is projected to reach USD 256.73 Million by 2031, growing at a CAGR of 47.1% during the forecast period 2024-2031.
Causal AI also known as causal artificial intelligence is a significant innovation in the fields of artificial intelligence and machine learning that focuses on identifying and harnessing cause-and-effect linkages in data. Traditional AI models generally use correlation-based methods to detect patterns and generate predictions. While these methods can be quite useful in specific applications, they frequently fall short in situations where understanding the underlying causal mechanisms is critical. Causal AI overcomes this issue by incorporating principles from causal inference, a branch of statistics and philosophy that investigates how to infer causal correlations from data.
Causal AI is a huge leap in the field of artificial intelligence allowing us to go beyond correlation to discover the true drivers of observed occurrences. Its applications are broad and diverse including healthcare, finance, marketing, policymaking, operations, education, the environment, and social sciences. Causal AI improves decision-making and allows for the development of focused solutions to meet difficult situations by offering a richer grasp of causality.
Causal AI (Artificial Intelligence) has the potential to change a wide range of domains by providing more precise and actionable insights than typical machine learning models. Causal AI differs from traditional AI in that it focuses on understanding the cause-and-effect relationships underlying data rather than correlations and patterns. This change from correlation to causation is a huge step forward with the potential to improve decision-making processes make better forecasts, and maximize outcomes in a variety of industries including healthcare, finance, marketing, and others.
We present a synthetic medicare claims dataset linked to environmental exposures and potential confounders. In most environmental health studies relying on claims data, data restrictions exist and the data cannot be shared publicly. Centers for Medicare and Medicaid services (CMS) has generated synthetic publicly available Medicare claims data for 2008-2010. In this dataset, we link the 2010 synthetic Medicare claims data to environmental exposures and potential confounders. We aggregated the Medicare claims synthetic data for 2010 to the county level. Data is compiled for the contiguous United States, which in 2010, included 3109 counties. We merged the Medicare claims synthetic data with air pollution exposure data, more specifically with estimates of 𝑃𝑀2.5 exposures obtained from Di et al., 2019, 2021, which provided daily and annual estimates of PM2.5 exposure at 1 km×1 km grid cells in the contiguous United States. We use Census Bureau (United States Census Bureau, 2021), the Center for Disease Control (Centers for Disease Control and Prevention (CDC), 2021), and GridMET (Abatzoglou, 2013) to obtain data on potential confounders. The mortality rate, as the outcome, was computed using the synthetic Medicare data (CMS, 2021). We use the average of surrounding counties to impute missing observations, except in the case of the CDC confounders, where we imputed missing values by generating a normal distribution for each state and randomly imputing from this distribution. The steps for generating the merged dataset are provided at NSAPH Synthetic Data Github Repository (https://github.com/NSAPH/synthetic_data). Analytic inferences based on this synthetic dataset should not be made. The aggregated dataset is composed of 46 columns and 3109 rows.
Pedigree of all data and processing included in the manuscript. Open zip file then access pedigree folder for file describing all other folders, links, and data dictionary Items: NOTES: Description of work and other worksheets. Pedigree: Summary source files used to create figures and tables. DataFiles: Data files used in the R code for creating the figures and tables. DataDictionary: Data file titles in all data files Data: Data file uploaded to Science Hub Output: Files generated from R scripts Plot: Plots generated from R scripts and other software R_Scripts: Clean R scripts used to analyze the data, generate figures and tables Result: Tables generated from R scripts
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study investigated the effects of AI-powered online shopping service attributes and Generation Z consumer characteristics on the attitudes and behaviors of Generation Z consumers. Using focus groups and inductive analysis, six key attributes of AI-powered digital assistance were identified from the perspective of Generation Z consumers. Rough Set Analysis was applied to establish causal relationships among the conditional attributes of AI-powered digital assistance, consumer characteristics, and Generation Z consumers’ attitudes and behaviors, resulting in ten decision-making rules. The findings extend academic research on the response mechanisms of Generation Z consumers to novel technology services and provide managers strategic insights for improving online services.
Complex interactions among multiple abiotic and biotic drivers result in rapid changes in ecosystems worldwide. Predicting how specific interactions can cause ripple effects potentially resulting in abrupt shifts in ecosystems is of high relevance to policymakers, but difficult to quantify using data from singular cases. We present causalizeR (https://github.com/fjmurguzur/causalizeR), a text-processing algorithm that extracts causal relations from literature based on simple grammatical rules that can be used to synthesize evidence in unstructured texts in a structured manner. The algorithm extracts causal links using the relative position of nouns relative to the keyword of choice to extract the cause and effects of interest. The resulting database can be combined with network analysis tools to estimate the direct and indirect effects of multiple drivers at the network level, which is useful for synthesizing available knowledge and for hypothesis creation and testing. We illustrate the use of the algorithm by detecting causal relationships in scientific literature relating to the tundra ecosystem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If a medium has a monopoly in covering political news and daily distorts the news in favor of the ruling autocrat, how large will the persuasion effect be? Through which channels will such persuasion operate most? Working with a representative sample of the Russian population, I use a causal mediation analysis to figure out whether (1) frequency of exposure and/or (2) reliance on biased reporting mediate the link between how people voted for incumbent elites and how they evaluate these elites in the present. Perceiving explicitly biased information as credible transmits a large and robust effect from voting to evaluation, while frequent exposure to this information produces an insignificant mediating effect. Another important finding is that the effect of perceived news credibility overrides the effect of electoral support: accepting state propaganda as credible information converts people into regime supporters regardless of their previous voting.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Propensity score methods are a widely recommended approach to adjust for confounding and to recover treatment effects with non-experimental, single-level data. This article reviews propensity score weighting estimators for multilevel data in which individuals (level 1) are nested in clusters (level 2) and nonrandomly assigned to either a treatment or control condition at level 1. We address the choice of a weighting strategy (inverse probability weights, trimming, overlap weights, calibration weights) and discuss key issues related to the specification of the propensity score model (fixed-effects model, multilevel random-effects model) in the context of multilevel data. In three simulation studies, we show that estimates based on calibration weights, which prioritize balancing the sample distribution of level-1 and (unmeasured) level-2 covariates, should be preferred under many scenarios (i.e., treatment effect heterogeneity, presence of strong level-2 confounding) and can accommodate covariate-by-cluster interactions. However, when level-1 covariate effects vary strongly across clusters (i.e., under random slopes), and this variation is present in both the treatment and outcome data-generating mechanisms, large cluster sizes are needed to obtain accurate estimates of the treatment effect. We also discuss the implementation of survey weights and present a real-data example that illustrates the different methods.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this paper, we introduce the causal forests method ests method (Athey et al., 2019) and illustrate how to apply it in social sciences to addressing treatment effect heterogeneity. Compared with existing parametric methods such as the multiplicative interaction model and traditional semi-/non-parametric estimation, causal forests are more flexible for complex data generating processes. Specifically, causal forests allow for nonparametric estimation and inference on heterogeneous treatment effects in the presence of many moderators. To reveal its usefulness, we revisit existing studies in political science and economics. We uncover new information hidden by original estimation strategies while producing findings that are consistent with conventional methods. Through these replication efforts, we provide a step-by-step practice guide for applying causal forests in evaluating treatment effect heterogeneity.
Simulation FilesSims900inds_L3Thresh250i.csv is CDPOP input file. C2C2_900Inds64PixelsIDXY.csv is file of individual locations used in CDPOP. CD3_900Inds64Pixels_R20.csv is input file of cost distances with b1=1 and b2=20. GDmatrix.csv is an output file from the 100th generation simulated by CDPOP.Archive.zip
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
With an unrepresentative sample, the estimate of a causal effect may fail to characterize how effects operate in the population of interest. What is less well understood is that conventional estimation practices for observational studies may produce the same problem even with a representative sample. Causal effects estimated via multiple regression differentially weight each unit's contribution. The effective sample'' that regression uses to generate the estimate may bear little resemblance to the population of interest, and the results may be nonrepresentative in a manner similar to what quasi-experimental methods or experiments with convenience samples produce. There is no general external validity basis for preferring multiple regression on representative samples over quasi-experimental or experimental methods. We show how to estimate the
multiple regression weights'' that allow one to study the effective sample. We discuss alternative approaches that, under certain conditions, recover representative average causal effects. The requisite conditions cannot always be met.
Data and Matlab code used to produce figures in "A causal role for right frontopolar cortex in directed, but not random, exploration" Raw data is in: TMS_horizonTask.csv Each row corresponds to a single game. Each column corresponds to a separate variable * expt_name - stimulation condition = "vertex" or "RFPC" * replicationFlag - 0 for first set of subjects, 1 for second set. * subjectID - subject number * order - stimulation order * age - participant age in years * iswoman - participant gender 1 for female, 0 for male * sessionNumber - 1 or 2 * game - game number in experiment * gameLength - number of trials in this game, including four forced trials * uc - uncertainty condition, number of times option 2 is played in forced trials * m1 - true mean of option 1 * m2 - true mean of option 2 * r1, r2, etc ... - reward outcome on each trial, = nan if no outcome (e.g. on trial 6 in horizon 1 games) * c1, c2, etc ... - choice on trial t, 1 for left, 2 for right * rt1, rt2, etc ... - reaction time on trial t in seconds To generate figures from paper run: main_TMSanalysis_v3.m
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this paper we examine how children's time allocation affects their accumulation of cognitive skill. Children's time allocation is endogenous in a model of skill production since it is chosen by parents and children. We apply a recently developed test of exogeneity to search for specifications that yield causal estimates of the impact time inputs have on child skills. The test exploits bunching in time inputs induced by a nonnegativity time constraint and it has power to detect a variety of sources of endogeneity. We find that with a sufficiently rich set of controls we are unable to reject exogeneity in our most detailed production function specifications. The estimates from these specifications indicate that active time with adult family members, such as parents and grandparents, are the most productive in generating cognitive skill.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Direct, indirect, total and marginal effects for Fig. 4. (XLSX 94 kb)
Traditionally, a pedigree-based individual-tree mixed model (ABLUP) has been used in forest genetic evaluations to identify individuals with the highest breeding values (BVs). ABLUP is a Markovian causal model, as any individual BV can be expressed as a linear regression on its parental BVs. The regression coefficients are based on the genealogical parent-offspring relationship and are equal to one-half. This study aimed to develop and apply two new causal models that replace these fixed coefficients with ones calculated using genomic information, specifically derived from the genomic-based relationship matrix. We compared the performance of these genomic-based causal models with ABLUP and non-causal GBLUP models. To do so, we evaluated a four-generation population of Eucalyptus grandis, consisting of 3,082 genotyped trees with 14,033 single nucleotide polymorphism markers. Six traits were assessed in 1,219 trees across the first three breeding cycles. The heritability and genetic means..., , , # Forest tree breeding using genomic Markov causal models: A new approach to genomic tree breeding improvement
https://doi.org/10.5061/dryad.pzgmsbczh
GENERAL INFORMATION
1. Title of Dataset: Forest tree breeding using genomic Markov causal models: A new approach to genomic tree breeding improvement
2. Author Information
A. Principal Investigator Contact Information
Name: Esteban Javier Jurcic
Institution: Instituto Nacional de TecnologÃa Agropecuaria (INTA)
Address: De Los Reseros y Dr. Nicolás Repetto s/n, 1686, Hurlingham, Buenos Aires, Argentina.
Email: jurcic.esteban@inta.gob.ar
B. Associate or Co-investigator Contact Information
Name: Eduardo Pablo Cappa
Institution: Instituto Nacional de TecnologÃa Agropecuaria (INTA) - CONICET
Address: De Los Reseros y Dr. Nicolás Repetto s/n, 1686, Hurlingham, Buenos Aires, Argentina.
Email: [cappa.eduar...,
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim: Identifying the mechanisms influencing species’ distributions is critical for accurate climate change forecasts. However, current approaches are limited by correlative models that cannot distinguish between direct and indirect effects. Location: New Hampshire and Vermont, USA. Methods: Using causal and correlational models and new theory on range limits, we compared current (2014–2019) and future (2080s) distributions of ecologically important mammalian carnivores and competitors along range limits in the northeastern US under two global climate models (GCMs) and a high-emissions scenario (RCP8.5) of projected snow and forest biomass change. Results: Our hypothesis that causal models of climate-mediated competition would result in different distribution predictions than correlational models, both in the current and future periods, was well-supported by our results; however, these patterns were prominent only for species pairs that exhibited strong interactions. The causal model predicted the current distribution of Canada lynx (Lynx canadensis) more accurately, likely because it incorporated the influence of competitive interactions mediated by snow with the closely related bobcat (Lynx rufus). Both modeling frameworks predicted an overall decline in lynx occurrence in the central high elevation regions and increased occurrence in the northeastern region in the 2080s due to changes in land use that provided optimal habitat. However, these losses and gains were less substantial in the causal model due to the inclusion of an indirect buffering effect of snow on lynx. Main conclusions: Our comparative analysis indicates that a causal framework, steeped in ecological theory, can be used to generate spatially-explicit predictions of species distributions. This approach can be used to disentangle correlated predictors that have previously hampered understanding of range limits and species’ response to climate change. Methods We used data from 257 camera-trap sites spaced in non-overlapping grids based on the home range size of the smallest carnivore species (Martes americana = 2x2 km). Each site included a remote camera positioned facing north on a tree, 1–2 m above the snow surface, and pointed at a slight downward angle towards a stake positioned 3–5 m from the camera. Commercial skunk lure and turkey feathers were used as attractants and placed directly on the snow stakes. Cameras were set to take 1–3 consecutive pictures every 1–10 sec when triggered, depending on the brand and model, and checked on average 3 (range = 1–9) times each season to download data, refresh attractants, and to ensure cameras were working properly.
We used camera data from autumn to spring (16 October–15 May) for each year (2014–2019). This seasonal range was chosen as it approximates demographic (i.e., births and deaths) and geographic closure (i.e., dispersal) and is based on species’ ecological responses to snowpack and leaf phenology of the region. We identified species in photographs by their unique morphology and field marks and used consensus from multiple observers when identification was uncertain. We organized camera data into weekly occasions using CPW Photo Warehouse and recorded whether or not each species was detected during the occasion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameter Settings of Synthetic Data Generation.