Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, Federated Learning (FL) has gained traction as a privacy-centric approach in medical imaging. This study explores the challenges posed by data heterogeneity on FL algorithms, using the COVIDx CXR-3 dataset as a case study. We contrast the performance of the Federated Averaging (FedAvg) algorithm on non-identically and independently distributed (non-IID) data against identically and independently distributed (IID) data. Our findings reveal a notable performance decline with increased data heterogeneity, emphasizing the need for innovative strategies to enhance FL in diverse environments. This research contributes to the practical implementation of FL, extending beyond theoretical concepts and addressing the nuances in medical imaging applications. This research uncovers the inherent challenges in FL due to data diversity. It sets the stage for future advancements in FL strategies to effectively manage data heterogeneity, especially in sensitive fields like healthcare.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview of Federated Learning (FL) and data heterogeneity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of dataset (from Datastream, Bloomberg and BIS) corresponding to the paper "Heterogeneity and Dynamics in Network Models" by Enzo D'Innocenzo, Andre Lucas, Anne Opschoor, Xingmin Zhang (corresponding author)
Folder 1 - linescans: Folder contains linescans in TIF format of the four atrial cells referred to in Figure 1 of the manuscript. Folder 2 - di-8-ANEPPS: A folder containing two further folders ('atrial' and 'ventricular'), which contain, respectively, TIF images of the 10 atrial and 4 ventricular di-8-ANEPPS-stained cells referred to in Figure 2. Sham v control TTD: A spreadsheet containing t-tubule densities (TTD) of atrial and ventricular cells from Sham and Control animals to establish that there was no difference the t-tubule network in either atrial or ventricular cells between these two groups of animals. Control atrial 1: Original images of sections from atrial tissue from control animals used for analysis presented in Figures 4 - 6. Folder 1 of 5. Control atrial 2: Original images of sections of atrial tissue from control animals used for analysis shown in Figures 4 - 6. Folder 2 of 5. Sham atrial: Original images of atrial sections from Sham animals used for analysis presented in Figures 4 - 6. Folder 3 of 5. Control ventricular: Original images of ventricular sections from control animals used for analysis presented in Figures 4 - 6. Folder 4 of 5. Sham ventricular: Original images of sections from ventricular cells from Sham animals used for analysis presented in Figures 4 - 6. Folder 5 of 5.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We study the local dynamical fluctuations in glass-forming models of particles embedded in d-dimensional space, in the mean-field limit of d→∞. Our analytical calculation reveals that single-particle observables, such as squared particle displacements, display divergent fluctuations around the dynamical (or mode-coupling) transition, due to the emergence of nontrivial correlations between displacements along different directions. This effect notably gives rise to a divergent non-Gaussian parameter, α_2. The d→∞ local dynamics therefore becomes quite rich upon approaching the glass transition. The finite-d remnant of this phenomenon further provides a long sought-after, first-principle explanation for the growth of α_2 around the glass transition that is not based on multi-particle correlations. ... [Read More]
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
All the data files are publicly available in the Dryad Digital Repository, at https://doi.org/10.5061/dryad.dbrv15f7j (Priyadarshana et al. 2024). The source codes for the statistics are publicly available in the Zenodo Digital Repository, at https://doi.org/10.5281/zenodo.10799017. These data files and source codes are also accessible via the GitHub Digital Repository, at https://github.com/Tharaka18/spatial.heterogeneity.meta.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive contains results associated with the publication
Tissue heterogeneity is prevalent in gene expression studies. Gregor Sturm, Markus List and Jitao David Zhang.
expr.tissuemark.affy.roche.symbols.gmt: The tissue signatures from the BioQC publication used in this study
gtex_v6_gini_solid.gmt: The cross-platform cross-species validated tissue signatures produced in this study
heterogeneity_results.tsv.gz: Signature scores and heterogeneity calls for each tested signature
heterogeneity_fractions.tsv: Fraction of heterogeneous and severely heterogeneous samples per tissue
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We develop a behavioral asset pricing model in which agents trade in a market with information friction. Profit-maximizing agents switch between trading strategies in response to dynamic market conditions. Owing to noisy private information about the fundamental value, the agents form different evaluations about heterogeneous strategies. We exploit a thin set-a small sub-population-to point identify this nonlinear model, and estimate the structural parameters using extended method of moments. Based on the estimated parameters, the model produces return time series that emulate the moments of the real data. These results are robust across different sample periods and estimation methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code for reproducing figures and results for the manuscript entitled "Disentangeling dispersion from mean reveals true heterogeneity-diversity relationships". Publication forthcoming. See references for data sources.
Code tested with Julia version 1.11.1.
How to cite this repository
If using code or data from this repository, please cite the original publication (forthcoming) and respective data source (see references and README.txt in respective data folder).
Update 2024-07-09
Minor changes to figure sizes and use of paired-sample t-tests when assessing empirical observations of heterogeneity measures.
Update 2024-08-04
Step by step instructions included in README
Manifest.toml file included with julia and package version requirements.
Update 2024-11-18
Update following peer review feedback:
Analysis of an additional dataset from MacArthurs' seminal paper on foliage height diversity.
Hypothesis test of negligible trend for delta
Modified extended data figures
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heterogeneity in risk attitudes, if not properly accounted for, may induce a bias on the income coefficient of standard consumption insurance regressions. We show that, extending the theoretical analysis and empirical findings in Schulhofer-Wohl (Journal of Political Economy, 2011, 119, 925-958), the sign of the bias is ambiguous, and depends on cycle-related variables and on the covariances of both aggregate and idiosyncratic risk with individual risk aversion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper demonstrates that the unobserved heterogeneity commonly assumed to be the source of overdispersion in count data models has predictable implications for the probability structure of such mixture models. In particular, the common observation of excess zeros is a strict implication of unobserved heterogeneity. This result has important implications for using count model estimates for predicting certain interesting parameters. Test statistics to detect such heterogeneity-related departures from the null model are proposed and applied in a health-care utilization example, suggesting that a null Poisson model should be rejected in favour of a mixed alternative.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository contains simulated data used for benchmarking HUNTRESS against the existing alternative tools. Results of the benchmarking are shown in Extended Data Figures 1-10 of the paper "Fast Intratumor Heterogeneity Inference from Single-Cell Sequencing Data" (to appear in Nature Computational Science).
To estimate treatment heterogeneity in two randomized controlled trials of a youth summer jobs program, we implement Wager and Athey's (2015) causal forest algorithm. We provide a step-by-step explanation targeted at applied researchers of how the algorithm predicts treatment effects based on observables. We then explore how useful the predicted heterogeneity is in practice by testing whether youth with larger predicted treatment effects actually respond more in a hold-out sample. Our application highlights some limitations of the causal forest, but it also suggests that the method can identify treatment heterogeneity for some outcomes that more standard interaction approaches would have missed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We examine demand behaviour for intertemporal dependencies, using Spanish panel data. We present evidence that there is both state dependence and correlated heterogeneity in demand behaviour. Our specific findings are that food outside the home, alcohol and tobacco are habit forming, whereas clothing and small durables exhibit durability. We conclude that demand analyses using cross-section data that ignore these effects may be seriously biased. On the other hand, the degree of intertemporal dependence is not sufficiently strong to make composite consumption significantly habit forming, as has been suggested in some recent analyses.
Information is provided in the readme file.
We also have the data and scripts available on github (https://github.com/idworkin/Wilson2021_Evolution_Data)
The few columns with missing data either are empty, or have NA.
By using graphical representations of simple portfolio choice problems, we generate a very rich dataset to study behavior under uncertainty at the level of the individual subject. We test the data for consistency with the maximization hypothesis, and we estimate preferences using a two-parameter utility function based on Faruk Gul (1991). This specification provides a good interpretation of the data at the individual level and can account for the highly heterogeneous behaviors observed in the laboratory. The parameter estimates jointly describe attitudes toward risk and allow us to characterize the distribution of risk preferences in the population. (JEL D11, D14, D81, G11)
Dataset Overview This dataset contains individual-level data from a randomized controlled trial (RCT) conducted in northern Uganda, along with associated satellite imagery. It is designed to investigate how treatment effects may vary across different geographical and contextual settings by leveraging both tabular and image-based variables.
Motivation and Content
Researchers often wish to explore treatment effect heterogeneity, especially in studies focused on global poverty. Traditional variables—such as age and ethnicity—are typically collected near the time of data gathering and may overlook broader environmental, historical, or neighborhood-specific factors. Incorporating satellite images into causal inference analyses provides a valuable window into such contextual factors. This dataset exemplifies how researchers can combine tabular data (e.g., demographic variables, outcomes, treatment indicators) with geospatially keyed satellite imagery to model and interpret how treatment effects change across different locations.
Potential Use Cases
Causal Inference Research: Apply image-based methods to detect and explain geographic or contextual heterogeneity in RCT outcomes. Policy Evaluation: Aid policymakers in identifying areas or populations most likely to benefit from poverty-alleviation interventions. Methodological Innovations: Serve as a testbed for new models that integrate high-dimensional or unstructured data (images) with standard tabular data in the causal inference setting.
Source Connor T. Jerzak, Fredrik Johansson, Adel Daoud. Image-based Treatment Effect Heterogeneity. Proceedings of the Second Conference on Causal Learning and Reasoning (CLeaR), Proceedings of Machine Learning Research (PMLR), 213: 531-552, 2023.
Ecological meta-analyses usually exhibit high relative heterogeneity of effect size: most among-study variation in effect size represents true variation in mean effect size, rather than sampling error. This heterogeneity arises from both methodological and ecological sources. Methodological heterogeneity is a nuisance that complicates the interpretation of data syntheses. One way to reduce methodological heterogeneity is via coordinated distributed experiments, in which investigators conduct the same experiment at different sites, using the same methods. We tested whether coordinated distributed experiments in ecology exhibit a) low heterogeneity in effect size, and b) lower heterogeneity than meta-analyses, using data on 17 effects from eight coordinated distributed experiments, and 406 meta-analyses. Consistent with our expectations, among-site heterogeneity typically comprised <50% of the variance in effect size in distributed experiments. In contrast, heterogeneity within and amo..., , , # Coordinated distributed experiments in ecology do not consistently reduce heterogeneity in effect size
Included here is a data file for a distributed experiment, and code which analyses the heterogeneity of many coordinated distributed experiments and meta-analyses. The R code file reproduces the results of this study, called meta-analyses vs distd expts - R code for sharing v 2.R.
Data File:
rousk et al 2013 table 3 data - INCREASE.csv: data from the INCREASE distributed experiment by Rousk et al. (2013)
All other data used in code is automatically sourced from URLs, but relevant variables are still described below.
Other variables in datasets were not used in our analysis, and so are not explained in this README file. Cells with missing data have "NA" values.
Variables used in code:
Costello & Fox variables:Â
meta.analysis.id: Unique ID number for each meta-analysis
eff.size: Effect size
var. eff.size: Variance in e...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication data for Study 4a-c: All estimates by study participants. Dataset show participantID (unique to this study), taskID (see article supplement for list of tasks), base rate (b), hit rate (h), and false alarm rate (f), correct solution and participant estimate.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the literature on U.S. Supreme Court agenda-setting is sizable, justice-vote-level multivariate analyses of certiorari are almost exclusively limited to samples of discussed cases from 1986--1993. Moreover, these studies have done very little to explore justice-level heterogeneity on certiorari. Here, we address these lacunae by analyzing the predictors of individual justices' cert votes on all paid cases from the 1939, 1968, and 1982 terms. We find substantial justice-level heterogeneity in the weight that justices place on the standard set of forces shaping the cert vote. We also show that some of this heterogeneity is associated with justices' experience and ideological extremism, largely in theoretically predicted ways. In closing, we sound a note of caution on drawing conclusions about effects of justice attributes, when the number of justices is relatively small.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, Federated Learning (FL) has gained traction as a privacy-centric approach in medical imaging. This study explores the challenges posed by data heterogeneity on FL algorithms, using the COVIDx CXR-3 dataset as a case study. We contrast the performance of the Federated Averaging (FedAvg) algorithm on non-identically and independently distributed (non-IID) data against identically and independently distributed (IID) data. Our findings reveal a notable performance decline with increased data heterogeneity, emphasizing the need for innovative strategies to enhance FL in diverse environments. This research contributes to the practical implementation of FL, extending beyond theoretical concepts and addressing the nuances in medical imaging applications. This research uncovers the inherent challenges in FL due to data diversity. It sets the stage for future advancements in FL strategies to effectively manage data heterogeneity, especially in sensitive fields like healthcare.