Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used to generate PCA plots of 1) DNA in water (summer) and sediment based on normalized taxonomic read counts; and 2) DNA and RNA present in summer water and sediment based on normalized counts of all functionally annotated genes from the metagenomic assembly.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Population genetic studies in non-model systems increasingly use next-generation sequencing to obtain more loci, but such methods also generate more missing data that may affect downstream analyses. Here we focus on the Principal Component Analysis (PCA) which has been widely used to explore and visualize population structure with mean-imputed missing data. We simulated data of different population models with various total missingness (1%, 10%, 20%) introduced either randomly or biased among individuals or populations. We found that individuals biased with missing data would be dragged away from their real population clusters to the origin of PCA plots, making them indistinguishable from true admixed individuals and potentially leading to misinterpreted population structure. We also generated empirical data of the big brown bat (Eptesicus fuscus) using restriction site-associated DNA sequencing (RADseq). We filtered three data sets with 19.12%, 9.87%, and 1.35% total missingness, all showing nonrandom missing data with biased individuals dragged towards the PCA origin, consistent with results from simulations. We highlight the importance of considering missing data effects on PCA in non-model systems where nonrandom missing data are common due to varying sample quality. To help detect missing data effects, we suggest to 1) plot PCA with a color gradient showing per sample missingness, 2) interpret samples close to the PCA origin with extra caution, 3) explore filtering parameters with and without the missingness-biased samples, and 4) use complementary analyses (e.g., model-based methods) to cross-validate PCA results and help interpret population structure.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dear TMU PhD candidate,
Please run principal component analysis (PCA) for dataset as below:
a) Calculate Eigenvalue for PC1 and PC2.
b) Draw the scatter plot.
c) Which loading plots are in the opposite direction to the others?
Thank You
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-2321https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-2321
This dataset contains the source code for uncertainty-aware principal component analysis (UA-PCA) and a series of images that show dimensionality reduction plots created with UA-PCA. The software is a JavaScript library for performing principal component analysis and dimensionality reduction on datasets consisting of multivariate probability distributions. Each plot of the image series used UA-PCA to project a dataset consisting of multivariate normal distributions. The covariance matrices of the dataset instances were scaled with different factors resulting in different UA-PCA projections. The projected probability distributions are displayed using isolines of their probability density functions. As the scaling value increases, the projection changes, showing the sensitivity of UA-PCA to changes in variance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Balance Sheet: Tier 1 Leverage Capital (PCA Definition) was 2221887.22900 Mil. of U.S. $ in January of 2025, according to the United States Federal Reserve. Historically, United States - Balance Sheet: Tier 1 Leverage Capital (PCA Definition) reached a record high of 2221887.22900 in October of 2024 and a record low of 147414.03800 in January of 1984. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Balance Sheet: Tier 1 Leverage Capital (PCA Definition) - last updated from the United States Federal Reserve on July of 2025.
In this lesson, students interpret a scatter plot showing the results of a principal components analysis (PCA). They view an interview with Dr. Stephanie Smith, who explains how PCA calculations work, and why she chose to use this analysis to visualize her data. Dr. Smith also discusses her journey becoming a scientist and describes a typical day at work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Tables and Figures for Li and Ralph, "Local PCA Shows How the Effect of Population Structure DiffersAlong the Genome"Table S1.Correlations between MDS coordinates of genomic regions between runs withdifferent parameter values. To produce these, we first ran the algorithm withthe specified window size and number of PCs (k) on the full Medicago truncatuladataset. Then to obtain the correlation between results obtained fromparameters A in the row of the matrix above and parameters B in the column ofthe matrix above, we mapped the windows of B to those of A by averaging MDScoordinates of any windows of B whose midpoints lay in the corresponding windowof A; we then computed the correlation between the MDS coordinates of A and theaveraged MDS coordinates of B. This is not a symmetric operation, so thesematrices are not symmetric. As expected, parameter values with smaller windowsproduce noisier estimates, but plots of MDS values along the genome arevisually very similar.Figure S1.PCA plots for chromosome arms 2L, 2R, 3L, 3R and X of the Drosophilamelanogaster dataset.Figure S2.PCA plots for all 22 human autosomes from the POPRES data.Figure S3.PCA plots for all 8 chromosomes in the Medicago truncatula dataset.Figure S4.MDS visualizations of the Gaussian genotypes described in the Appendix, for 50individuals from each of three populations. (top) The first quarter, middlehalf, and final quarter of the chromosome each have different populationstructure, as expected, despite the possibility for PC switching within each.(bottom) The same picture results even after marking a random 50% of thegenotypes in the first half of the chromosome as missing.Figure S5.MDS visualizations of the results of individual-based simulations using SLiM(see Appendix for details). All simulations are neutral, and recombination is:(top) constant; (top middle) varies stepwise by factors of two in sevenequal-length segments, with highest rates on the ends, so the middle segmenthas a recombination rate 64 times lower than the ends; (bottom middle)according to the HapMap human female chromosome 7 map. The bottom figure showsPCA maps corresponding to the three colored windows of the last (HapMap)situation; the outlying regions are long regions of low recombination rate, sothat region can be dominated by a few correlated trees, similar to aninversion. The (inset) provides a key to the locations of the individuals onthe spatial landscape.Figure S6.MDS visualizations of the results of individual-based simulations using SLiM(see Appendix for details). All simulations incorporate linked selection byallowing selected mutations to appear in the same two regions of the genome:the one-sixth of the genome immediately before the halfway point, and the lastone-sixth of the genome. (top) Constant recombination rate. (top middle)Stepwise varying recombination rate. (bottom middle) Constant recombinationrate with spatially varying effects of selection. (bottom) PCA plotscorresponding to the highlighted corners of the last MDS visualization, showinghow spatially varying linked selection has affected patterns of relatedness.The (inset) provides a key to the locations of the individuals on the spatiallandscape.Figure S7.MDS visualizations for each chromosome arm of Drosophila melanogaster, as inFigure 2, except that the method was run using five PCs (k=5) instead oftwo.Figure S8.The proportion of data in each window that are missing, compared to the valueof the first MDS coordinate for the Drosophila melanogaster data from Figure 2.Figure S9.PCA plots for the three sets of genomic windows colored in Figure 2, on eachchromosome arm of Drosophila melanogaster. In all plots, each pointrepresents a sample. The first column shows the combined PCA plot for windowswhose points are colored green in Figure 2; the second is for orange windows;and the third is for purple windows.Figure S10.Variation in structure for windows of 1,000 SNPs across Drosophila melanogasterchromosome arms: without inversions. As in Figure 2, but after omitting foreach chromosome arm individuals carrying the less frequent orientation of anyinversions on that chromosome arm. The values differ from those in Figure 4 inthe window size used and that some MDS values were inverted (but relativeorientation is meaningless as chromosome arms were run separately, unlike forMedicago). In all plots, each point represents one window along the genome.The first column shows the MDS visualization of relationships between windows,and the second and third columns show the midpoint of each window against thetwo MDS coordinates; rows correspond to chromosome arms. Vertical lines showthe breakpoints of known polymorphic inversions. Figure S11.Recombination rate, and the effects of population structure for Drosophilamelanogaster: this shows the first MDS coordinate and recombination rate (incM/Mbp), as in Figure 4, against each other. Since the windows underlyingestimates of Figure 4 do not coincide, to obtain correlations we divided thegenome into 100Kbp bins, and for each variable (recombination rate and MDScoordinate 1) averaged the values of each overlapping bin with weightproportional to the proportion of overlap. The correlation coefficient andp-values for each linear regression are as follows: 2L: correlation=0.52,r^2=0.27; 2R: correlation=0.43, r^2=0.18; 3L: correlation=0.47, r^2=0.21; 3R:correlation=0.46, r^2=0.21; X: correlation=0.50, r^2=0.24.Figure S12. MDS plots for human chromosomes 1-8. The first column shows the MDSvisualization of relationships between windows, and the second and thirdcolumns show the midpoint of each window against the two MDS coordinates; rowscorrespond to chromosomes. Colorful vertical lines show the breakpointsof known valid inversions, while grey vertical lines show the breakpoints ofpredicted inversions.Figure S13.MDS plots for human chromosomes 9-16, as in Supplementary Figure S12.Figure S14.MDS plots for human chromosomes 17-22, as in Supplementary Figure S12.Figure S15.Comparison of PCA figures within outlying windows (center column) and flankingnon-outlying windows (left and right columns) for the two windows havingoutlying MDS scores on chromosome 8.Figure S16.MDS visualization of variation in the effects of population structure amongstwindows across all human autosomes simultaneously. The small group ofwindows with positive outlying MDS values lie around the inversion at 8p23.Figure S17.First MDS coordinate against gene density for all 8 chromosomes of M. truncatula.The first MDS coordinate is significantly correlated with gene count (r=0.149, p=2.2e-16). Figure S18.MDS visualizations of the effects of population structure for all 8 chromosomes of the Medicago truncatula data, using windows of 10000 SNPs.Figure S19.PCA plots for regions colored in Figure S18 on all 8 chromosomes ofMedicago truncatula: (A) green, (B) orange, and (C) purple.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Balance Sheet: Total Risk Based Capital (PCA Definition) was 2319171.53100 Mil. of U.S. $ in January of 2025, according to the United States Federal Reserve. Historically, United States - Balance Sheet: Total Risk Based Capital (PCA Definition) reached a record high of 2319171.53100 in October of 2024 and a record low of 322350.26900 in January of 1990. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Balance Sheet: Total Risk Based Capital (PCA Definition) - last updated from the United States Federal Reserve on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supporting Dataset for Figure 1a: PCA of Metabolomic Profiles in Sweet Potato under Drought Stress.
This dataset includes the raw metabolomic matrix, sample metadata, R script, and output PCA plot used in Figure 1a of the article:
**"Unveiling Stage-Specific Flavonoid Dynamics Underlying Drought Tolerance in Sweet Potato (*Ipomoea batatas* L.) via Integrative Transcriptomic and Metabolomic Analyses"**
The PCA plot was generated based on metabolite abundance data from sweet potato leaves collected under different drought stress stages (CK, DS1, DS2). The plot visualizes the clustering patterns of samples and highlights treatment-driven variation in metabolite profiles.
### Contents
- `PCA_metabolomics_plot.R`: R script to perform PCA and generate the plot.
- `PCA_metabolomics_data.xlsx`: Input data (Sheet 1: metabolite matrix; Sheet 2: sample group info).
- `PCA_metabolomics_plot.tiff`: Output figure (Figure 1a in the manuscript).
- `README.md`: Detailed instructions and metadata.
- `LICENSE`: CC-BY-4.0 license for reuse and attribution.
This dataset is intended for reproducibility, peer review, and public reuse under an open license.
We use version 12 (2022) of the V-Dem data (https://www.V-Dem.net) and apply standard principal component analysis (PCA). Following standard procedure, we normalized each V-Dem variable (i.e. centered it to a mean of zero and rescaled it to a variance of one) prior to performing PCA. For better readability of the plots, we rescaled all principal components uniformly such that the first component has a maximum absolute value of one (i.e. its values are bounded by [-1,1]) while preserving the mean of zero for all components. We further re-oriented each component such that its strongest loading is positive.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
: TG species are candidates for discriminating WT-FED from WT-FAS sample groups. Shown are PCA scatter plots of TG species with positive correlation, negative correlation, and the union of both, respectively.
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Balance Sheet: Total Risk Based Capital (PCA Definition) (QBPBSTRSKK) from Q1 1990 to Q1 2025 about capital and USA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Balance Sheet: Tier 1 Risk Based Capital (PCA Definition) was 2221887.22900 Mil. of U.S. $ in January of 2025, according to the United States Federal Reserve. Historically, United States - Balance Sheet: Tier 1 Risk Based Capital (PCA Definition) reached a record high of 2221887.22900 in October of 2024 and a record low of 147414.03800 in January of 1984. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Balance Sheet: Tier 1 Risk Based Capital (PCA Definition) - last updated from the United States Federal Reserve on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCA is applied to SNP main effects and S×I interaction effects combined (S&S×I), and the portion of each is shown in the last two columns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the row for each variable, numbers indicate the strength of correlation of that variable with the eigenvector of each PC. When the absolute value of correlation coefficients was ≥ 0.3, they were considered important (bold font) in defining the PC. Variables loaded on PCs 1–3 below, appear in the PCA plot (Fig 4A) with an asterisk (*) to indicate they also project upward (since PC3 is perpendicular to axes for PCs 1 and 2).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supporting dataset for Figure 1a: Principal Component Analysis (PCA) of transcriptomic profiles in sweet potato leaves under drought stress.
This dataset accompanies the article:
**“Unveiling Stage-Specific Flavonoid Dynamics Underlying Drought Tolerance in Sweet Potato (*Ipomoea batatas* L.) via Integrative Transcriptomic and Metabolomic Analyses”**
Figure 1a illustrates PCA results based on global gene expression patterns from sweet potato leaves sampled under control and drought stress at two developmental stages. The PCA highlights clear clustering between treatment groups, indicating distinct transcriptomic responses.
### This dataset includes:
- Gene expression matrix (Sheet 1 of `PCA_transcriptomics_data.xlsx`)
- Sample group metadata (Sheet 2 of the same file)
- R script used for PCA analysis and figure generation (`PCA_transcriptomics_plot.R`)
- Final PCA figure (`PCA_transcriptomics_plot.pdf`)
- Metadata (`README.md`) and license file (`LICENSE`)
This resource enables full reproducibility of Figure 1a and facilitates open reuse in plant drought transcriptomics research.
[ Derived from parent entry - See data hierarchy tab ]
Simulated 2D residual velocity fields in the inner German Bight were subjected to Principal Component Analysis (PCA). Residual currents were obtained from coastDat2 barotropic 2D simulations with the hydrodynamic model TRIM-NP V2.1.22 in barotropic 2D mode on a Cartesian grid (1.6km spatial resolution) stored on an hourly basis for the years 1948 - 2012 (doi:10.1594/WDCC/coastDat-2_TRIM-NP-2d) and later extended until August 2015. The present analysis refers to the period Jan 1958 - Aug 2015. The spatial domain considered is the region to the east of 6 degrees east and to the south of 55.6 degrees north. All grid nodes with a bathymetry of less than 10m were excluded.
Residual velocities were calculated in two different ways: 1.) as 25h means, 2.) as monthly means.
Both types of residual current data are available from
The directory contains sub-directories for years and months. Daily residual currents for the 13th of September 1974, for instance, are stored in
while monthly mean residual currents for September 1974 are stored in:
All current fields provided were interpolated from the original Cartesian model grid to a more convenient regular geographical grid (116x76 nodes).
Mean residual currents are stored in:
This data set contains residual velocities both on original Cartesian grid nodes and interpolated to the geographical grid. An example plot is provided:
For PCA, two residual velocity components from each of 12133 Cartesian grid nodes were combined into one data vector (length 2x12133), referring to 21061 daily or 692 monthly time levels. Results of two independent PCAs for either daily or monthly mean fields are stored in:
Files contain three leading Principal Components (PCs) and corresponding Emipirical Orthogonal Functions (EOFs). Again EOFs were also interpolated to a regular geographical grid.
PC time series are also stored in plain ASCII format:
For monthly fields the number N of variables (N=2x12133) is much larger than the number T of time levels (T=692). Therefore, to reduce computational demands, the roles of time and space were formally interchanged. Having conducted the PCA the EOFs were then transformed back to the original spatial coordinates (cf. Section 12.2.6 in von Storch and Zwiers (1999), Statistical Analysis in Climate Research, Cambridge University Press).
A much larger number of time levels made even this approach prohibitive for the full set of daily data. Therefore, PCAs were performed for six sub-periods (1958-1965, 1966-1975, 1976-1985, 1986-1995, 1996-2005, 2006-2015(Aug)) independently. EOFs obtained from these six sub-periods were then averaged to obtain EOFs representative for the whole period. Corresponding PCs were calculated by projecting daily fields onto these average EOFs.
IMPORTANT: In contrast with PCA of monthly data, the PCA of daily data INVOLVES SOME APPROXIMATIONS!
EOFs on the original nodes were normalized to have unit lengths. The following figures,
show the first three EOFs obtained from daily data, assuming that corresponding PCs have the value of one standard deviation. The following two plots,
show the leading EOFs for monthly mean data. EOF3 is omitted as it represents just a very small percentage of overall variance (1.7%).
We used paleomagnetic results from Sites 998, 999, 1000, and 1001 to estimate the paleolatitude of the Caribbean region over the past 80 m.y. The data include remanence measurements of split-core sections (typically 1.5 m long) and discrete samples (6-12 cm**3 in volume) from volcanic and sedimentary rocks. From these, we computed 15 new paleolatitude estimates for Sites 999 and 1001 on the Caribbean plate and three new paleolatitude estimates for Site 998 on the Cayman Rise, currently on the southern North American plate. One estimate from Site 1001 is based on 230 measurements made along split-core sections of basalt after demagnetization of 20-25 mT. The other 17 estimates are based on principal component analysis of demagnetization data from 438 discrete paleomagnetic samples from sedimentary units. Where necessary, the 18 new paleolatitude estimates are corrected for a polarity ambiguity bias that occurs when averaging paleomagnetic data from drill cores that have shallow inclinations and are not azimuthally oriented. We also investigated the contribution of additional biases that may arise from a compaction-related inclination error, which could affect the sedimentary units, though not the basalt units. Several lines of evidence, including the lack of a correlation between porosity (or water content) and inclination, indicate that the inclination error is small, if present at all. The results from Sites 999 and 1001 indicate that the Caribbean plate was 5°-15° south of its current position at ~80 Ma, possibly placing it directly over the equator in the Late Cretaceous. Although the data do not preclude changes in the rate of northward motion over the past 80 m.y., they are consistent with a constant northward progression at a rate of 18 km/m.y. Given the uncertainties in the data, rates of northward motion could be as low as 8 km/m.y. or as high as 22 km/m.y. These results are compatible with several existing models for the evolution of the Caribbean plate, including those that have the Caribbean plate originating in the Pacific Ocean west of subduction zones active in the Central American region during the Cretaceous, and those that have the Caribbean plate originating within the Central American region, though more than 1000 km west of its current position relative to North and South America.
No description is available. Visit https://dataone.org/datasets/53e178628c83fe74286eb0a9f8df90d7 for complete metadata about this dataset.
The OMPS_NPP_NMSO2_PCA_L2 product is part of the MEaSUREs (Making Earth Science Data Records for Use in Research Environments) suite of products.It is retrieved from the NASA/NOAA Suomi National Polar-orbiting Partnership (SNPP) Ozone Mapping and Profiler Suite (OMPS) Nadir Mapper (NM) spectrometer and provides contiguous daily global monitoring of anthropogenic and volcanic sulfur dioxide (SO2), an important pollutant and aerosol precursor that affects both air quality and the climate. The product is based on the NASA Goddard Space Flight Center principal component analysis (PCA) spectral fitting algorithm (Li et al., 2013, 2017), and continues (Zhang et al., 2017) NASA's Earth Observing System (EOS) standard Aura/Ozone Monitoring Instrument SO2 product (OMSO2). The latest OMPS_NPP_NMSO2_PCA_L2 V2 product uses new Jacobian lookup tables and more realistic model based a priori profiles in anthropogenic SO2 retrievals. This helps to more accurately account for the pixel-to-pixel variation in SO2 sensitivity due to different factors such as the vertical distribution of SO2, solar and viewing angles, surface reflectivity, and cloudiness. As compared with the previous OMPS_NPP_NMSO2_PCA_L2 V1.2 product that assumes the same SO2 sensitivity for all OMPS pixels, the new V2 anthropogenic SO2 retrievals have reduced retrieval biases especially over background regions (see Figure 1 for an example). The same updated PCA SO2 retrieval algorithm (Li et al., 2020) is also used to produce the recently released OMSO2 V2 product (doi:10.5067/Aura/OMI/DATA2022). The new OMPS_NPP_NMSO2_PCA_L2 V2 product thus offers enhanced consistency between the NASA EOS standard (OMI) and continuity (OMPS) SO2 data recordsSulfur Dioxide (SO2) is a short-lived gas primarily produced by volcanoes, power plants, refineries, metal smelting and burning of fossil fuels. Where SO2 remains near the Earth's surface, it is toxic, causes acid rain, and degrades air quality. Where SO2 is lofted into the free troposphere, it forms aerosols that can alter cloud reflectivity and precipitation. In the stratosphere, volcanic SO2 forms sulfate aerosols that can result in climate change.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used to generate PCA plots of 1) DNA in water (summer) and sediment based on normalized taxonomic read counts; and 2) DNA and RNA present in summer water and sediment based on normalized counts of all functionally annotated genes from the metagenomic assembly.