10 datasets found
  1. f

    Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  2. f

    Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  3. Naturalistic Neuroimaging Database

    • openneuro.org
    Updated Apr 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.3
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.
    • The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

    v2.0 Changes

    • Overview
      • We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.
    • Normalization

      • Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:
      # Generate a resting state (rs) timeseries (ts)
      # Install / load package to make fake fMRI ts
      # install.packages("neuRosim")
      library(neuRosim)
      # Generate a ts
      ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
      # 3dDetrend -normalize
      # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
      # Do for the full timeseries
      ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
      # Do this again for a shorter version of the same timeseries
      ts.shorter.length <- length(ts.normalised.long)/4
      ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
      # By looking at the summaries, it can be seen that the median values become  larger
      summary(ts.normalised.long)
      summary(ts.normalised.short)
      # Plot results for the long and short ts
      # Truncate the longer ts for plotting only
      ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
      # Give the plot a title
      title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
      plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
      # Add zero line
      lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
      # 3dDetrend -normalize -polort 0 for long timeseries
      lines(ts.normalised.long.made.shorter, col='blue');
      # 3dDetrend -normalize -polort 0 for short timeseries
      lines(ts.normalised.short, col='red');
      
    • Standardization/modernization

      • The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.
    • New afni_proc.py command line

      • The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

      We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

      Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

    • Effect on results

      • From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
  4. t

    Supplemental materials to the conference paper "validating 111.1 million...

    • service.tib.eu
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Supplemental materials to the conference paper "validating 111.1 million marc records" [Dataset]. https://service.tib.eu/ldmservice/dataset/goe-doi-10-25625-amf8jc
    Explore at:
    Dataset updated
    Nov 14, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    User guide To generate the reports: prerequisite: Java 8 runtime environment download metadata-qa-marc project as it is described at https://github.com/pkiraly/metadata-qa-marc (e.g. into ~/git/metadata-qa-marc directory) download the .sh and .R files from this project to a subdirectory (e.g. 'scripts') adjust the DIR variable in the [library-name].sh files according to your directory structure run-all.sh creates -details.csv and -summary.csv files into $DIR/_reports directory If you do not want to generate the reports, but would like to use the data files provided, download *.csv.gz files to a '_reports' directory. To generate Table 2. and 3. of the paper: prerequisite: R move normalize-summary.sh, distill-ids.sh, and normalize-ids.sh into $DIR/_reports directory cd $DIR/_reports ./normalize-summary.sh ./distill-ids.sh ./normalize-ids.sh Rscript evaluate-details.R Rscript evaluate-summary.R

  5. d

    (high-temp) No 8. Metadata Analysis (16S rRNA/ITS) Output

    • search.dataone.org
    • smithsonian.figshare.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarrod Scott (2024). (high-temp) No 8. Metadata Analysis (16S rRNA/ITS) Output [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A718e0794-b5ff-4919-95ef-4a90a7890a5b
    Explore at:
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Smithsonian Research Data Repository
    Authors
    Jarrod Scott
    Description

    Output files from the 8. Metadata Analysis Workflow page of the SWELTR high-temp study. In this workflow, we compared environmental metadata with microbial communities. The workflow is split into two parts.

    metadata_ssu18_wf.rdata : Part 1 contains all variables and objects for the 16S rRNA analysis. To see the Objects, in R run _load("metadata_ssu18_wf.rdata", verbose=TRUE)_

    metadata_its18_wf.rdata : Part 2 contains all variables and objects for the ITS analysis. To see the Objects, in R run _load("metadata_its18_wf.rdata", verbose=TRUE)_
    Additional files:

    In both workflows, we run the following steps:

    1) Metadata Normality Tests: Shapiro-Wilk Normality Test to test whether each matadata parameter is normally distributed.
    2) Normalize Parameters: R package bestNormalize to find and execute the best normalizing transformation.
    3) Split Metadata parameters into groups: a) Environmental and edaphic properties, b) Microbial functional responses, and c) Temperature adaptation properties.
    4) Autocorrelation Tests: Test all possible pair-wise comparisons, on both normalized and non-normalized data sets, for each group.
    5) Remove autocorrelated parameters from each group.
    6) Dissimilarity Correlation Tests: Use Mantel Tests to see if any on the metadata groups are significantly correlated with the community data.
    7) Best Subset of Variables: Determine which of the metadata parameters from each group are the most strongly correlated with the community data. For this we use the bioenv function from the vegan package.
    8) Distance-based Redundancy Analysis: Ordination analysis of samples and metadata vector overlays using capscale, also from the vegan package.

    Source code for the workflow can be found here:
    https://github.com/sweltr/high-temp/blob/master/metadata.Rmd

  6. Data for A Systemic Framework for Assessing the Risk of Decarbonization to...

    • zenodo.org
    txt
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soheil Shayegh; Soheil Shayegh; Giorgia Coppola; Giorgia Coppola (2025). Data for A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union [Dataset]. http://doi.org/10.5281/zenodo.17152310
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Soheil Shayegh; Soheil Shayegh; Giorgia Coppola; Giorgia Coppola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 18, 2025
    Area covered
    European Union
    Description

    README — Code and data
    Project: LOCALISED

    Work Package 7, Task 7.1

    Paper: A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union

    What this repo does
    -------------------
    Builds the Transition‑Risk Index (TRI) for EU manufacturing at NUTS‑2 × NACE Rev.2, and reproduces the article’s Figures 3–6:
    • Exposure (emissions by region/sector)
    • Vulnerability (composite index)
    • Risk = Exposure ⊗ Vulnerability
    Outputs include intermediate tables, the final analysis dataset, and publication figures.

    Folder of interest
    ------------------
    Code and data/
    ├─ Code/ # R scripts (run in order 1A → 5)
    │ └─ Create Initial Data/ # scripts to (re)build Initial data/ from Eurostat API with imputation
    ├─ Initial data/ # Eurostat inputs imputed for missing values
    ├─ Derived data/ # intermediates
    ├─ Final data/ # final analysis-ready tables
    └─ Figures/ # exported figures

    Quick start
    -----------
    1) Open R (or RStudio) and set the working directory to “Code and data/Code”.
    Example: setwd(".../Code and data/Code")
    2) Initial data/ contains the required Eurostat inputs referenced by the scripts.
    To reproduce the inputs in Initial data/, run the scripts in Code/Create Initial Data/.
    These scripts download the required datasets from the respective API and impute missing values; outputs are written to ../Initial data/.
    3) Run scripts sequentially (they use relative paths to ../Raw data, ../Derived data, etc.):
    1A-non-sector-data.R → 1B-sector-data.R → 1C-all-data.R → 2-reshape-data.R → 3-normalize-data-by-n-enterpr.R → 4-risk-aggregation.R → 5A-results-maps.R, 5B-results-radar.R

    What each script does
    ---------------------
    Create Initial Data — Recreate inputs
    • Download source tables from the Eurostat API or the Localised DSP, apply light cleaning, and impute missing values.
    • Write the resulting inputs to Initial data/ for the analysis pipeline.

    1A / 1B / 1C — Build the unified base
    • Read individual Eurostat datasets (some sectoral, some only regional).
    • Harmonize, aggregate, and align them into a single analysis-ready schema.
    • Write aggregated outputs to Derived data/ (and/or Final data/ as needed).

    2 — Reshape and enrich
    • Reshapes the combined data and adds metadata.
    • Output: Derived data/2_All_data_long_READY.xlsx (all raw indicators in tidy long format, with indicator names and values).

    3 — Normalize (enterprises & min–max)
    • Divide selected indicators by number of enterprises.
    • Apply min–max normalization to [0.01, 0.99].
    • Exposure keeps real zeros (zeros remain zero).
    • Write normalized tables to Derived data/ or Final data/.

    4 — Aggregate indices
    • Vulnerability: build dimension scores (Energy, Labour, Finance, Supply Chain, Technology).
    – Within each dimension: equal‑weight mean of directionally aligned, [0.01,0.99]‑scaled indicators.
    – Dimension scores are re‑scaled to [0.01,0.99].
    • Aggregate Vulnerability: equal‑weight mean of the five dimensions.
    • TRI (Risk): combine Exposure (E) and Vulnerability (V) via a weighted geometric rule with α = 0.5 in the baseline.
    – Policy‑intuitive properties: high E & high V → high risk; imbalances penalized (non‑compensatory).
    • Output: Final data/ (main analysis tables).

    5A / 5B — Visualize results
    • 5A: maps and distribution plots for Exposure, Vulnerability, and Risk → Figures 3 & 4.
    • 5B: comparative/radar profiles for selected countries/regions/subsectors → Figures 5 & 6.
    • Outputs saved to Figures/.

    Data flow (at a glance)
    -----------------------
    Initial data → (1A–1C) Aggregated base → (2) Tidy long file → (3) Normalized indicators → (4) Composite indices → (5) Figures
    | | |
    v v v
    Derived data/ 2_All_data_long_READY.xlsx Final data/ & Figures/

    Assumptions & conventions
    -------------------------
    • Geography: EU NUTS‑2 regions; Sector: NACE Rev.2 manufacturing subsectors.
    • Equal weights by default where no evidence supports alternatives.
    • All indicators directionally aligned so that higher = greater transition difficulty.
    • Relative paths assume working directory = Code/.

    Reproducing the article
    -----------------------
    • Optionally run the codes from the Code/Create Initial Data subfolder
    • Run 1A → 5B without interruption to regenerate:
    – Figure 3: Exposure, Vulnerability, Risk maps (total manufacturing).
    – Figure 4: Vulnerability dimensions (Energy, Labour, Finance, Supply Chain, Technology).
    – Figure 5: Drivers of risk—highest vs. lowest risk regions (example: Germany & Greece).
    – Figure 6: Subsector case (e.g., basic metals) by selected regions.
    • Final tables for the paper live in Final data/. Figures export to Figures/.

    Requirements
    ------------
    • R (version per your environment).
    • Install any missing packages listed at the top of each script (e.g., install.packages("...")).

    Troubleshooting
    ---------------
    • “File not found”: check that the previous script finished and wrote its outputs to the expected folder.
    • Paths: confirm getwd() ends with /Code so relative paths resolve to ../Raw data, ../Derived data, etc.
    • Reruns: optionally clear Derived data/, Final data/, and Figures/ before a clean rebuild.

    Provenance & citation
    ---------------------
    • Inputs: Eurostat and related sources cited in the paper and headers of the scripts.
    • Methods: OECD composite‑indicator guidance; IPCC AR6 risk framing (see paper references).
    • If you use this code, please cite the article:
    A Systemic Framework for Assessing the Risk of Decarbonization to Regional Manufacturing Activities in the European Union.

  7. Altered Monocyte Phenotype in HIV-1 Infection Tends to Normalize with...

    • plos.figshare.com
    tiff
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie R. McCausland; Steven M. Juchnowski; David A. Zidar; Daniel R. Kuritzkes; Adriana Andrade; Scott F. Sieg; Michael M. Lederman; Nicholas T. Funderburg (2023). Altered Monocyte Phenotype in HIV-1 Infection Tends to Normalize with Integrase-Inhibitor-Based Antiretroviral Therapy [Dataset]. http://doi.org/10.1371/journal.pone.0139474
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Marie R. McCausland; Steven M. Juchnowski; David A. Zidar; Daniel R. Kuritzkes; Adriana Andrade; Scott F. Sieg; Michael M. Lederman; Nicholas T. Funderburg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundMonocytes are increasingly implicated in the inflammatory consequences of HIV-1 disease, yet their phenotype following antiretroviral therapy (ART) initiation is incompletely defined. Here, we define more completely monocyte phenotype both prior to ART initiation and during 48 weeks of ART.MethodsCryopreserved peripheral blood mononuclear cells (PBMCs) were obtained at baseline (prior to ART initiation) and at weeks 12, 24, and 48 of treatment from 29 patients participating in ACTG clinical trial A5248, an open label study of raltegravir/emtricitibine/tenofovir administration. For comparison, cryopreserved PBMCs were obtained from 15 HIV-1 uninfected donors, each of whom had at least two cardiovascular risk factors. Thawed samples were stained for monocyte subset markers (CD14 and CD16), HLA-DR, CCR2, CX3CR1, CD86, CD83, CD40, CD38, CD36, CD13, and CD163 and examined using flow cytometry.ResultsIn untreated HIV-1 infection there were perturbations in monocyte subset phenotypes, chiefly a higher frequency and density (mean fluorescence intensity–MFI) of HLA-DR (%-p = 0.004, MFI-p = .0005) and CD86 (%-p = 0.012, MFI-p = 0.005) expression and lower frequency of CCR2 (p = 0.0002) expression on all monocytes, lower CCR2 density on inflammatory monocytes (p = 0.045) when compared to the expression and density of these markers in controls’ monocytes. We also report lower expression of CX3CR1 (p = 0.014) on patrolling monocytes at baseline, compared to levels seen in controls. After ART, these perturbations tended to improve, with decreasing expression and density of HLA-DR and CD86, increasing CCR2 density on inflammatory monocytes, and increasing expression and density of CX3CR1 on patrolling monocytes.ConclusionsIn HIV-1 infected patients, ART appears to attenuate the high levels of activation (HLA-DR, CD86) and to increase expression of the chemokine receptors CCR2 and CX3CR1 on monocyte populations. Circulating monocyte phenotypes are altered in untreated infection and tend to normalize with ART; the role of these cells in the inflammatory environment of HIV-1 infection warrants further study.

  8. b

    Methods for normalizing microbiome data: an ecological perspective

    • nde-dev.biothings.io
    • search.dataone.org
    • +1more
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    University of New England
    James Cook University
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
  9. m

    Data and code for: Contagion risk prediction with Chart Graph Convolutional...

    • data.mendeley.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhensong Chen (2025). Data and code for: Contagion risk prediction with Chart Graph Convolutional Network: Evidence from Chinese stock market [Dataset]. http://doi.org/10.17632/6xy9d4bp28.1
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Zhensong Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the study “Contagion risk prediction with Chart Graph Convolutional Network: Evidence from Chinese stock market”, which proposes a framework for contagion risk prediction by comprehensively mining the features of technical charts and technical indicators. The utilized data include the closing prices of 28 sectors in Shen wan primary industry index, the closing price of CSI-300 Index, and eight classes of trading indicators that include Turnover Rate, Price-to-Earnings Ratio, Trading Volume, Relative Strength Index, Moving Average Convergence Divergence, Moving Average, Bollinger Bands, and Stochastic Oscillator. The sample period is from 5 Jan 2007 to 30 Dec 2022. The closing prices of 28 sectors are downloaded from the Choice database. The closing price of the CSI-300 Index and eight classes of trading indicators are downloaded from the Wind database. This dataset includes two raw data files, one predefined temporary file, and eighteen code files, which are described as follows: Sector_data.csv stores the closing prices of 28 sectors. CSI_300_data.csv includes closing price of CSI-300 Index, and eight classes of trading indicators. DCC_temp.csv is a predefined temporary file used to store correlation results. Descriptive_code.py is utilized to calculate the statistical results. ADF Test.py is utilized to test the stationarity of the data. Min-max normalization.py is utilized to standardize data. ADCC-GJR-GARCH.R is utilized to calculate dynamic conditional correlations between sectors. MST_figure.py is used to a construct complex network that illustrates the inter-sector relationships. Correlation.py is used to calculate inter-industry correlations. Corr_up.py, corr_mid.py and corr_down.py are used to calculate dynamic correlations in upstream, midstream, and downstream sectors. Centrality.py is used to quantify the importance or influence of nodes within a network, particularly across distinct upstream, midstream, and downstream sectors. Averaging_corr_over_a_5-day_period.py calculates 5-day rolling averages of correlation and centrality metrics to quantify contagion risk on a weekly cycle. Convert technical charts using PIP and VG methods.py extracts significant nodes and converts them into graphical representations, and save them in Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv. Convert_CSV_to_TXT.py converts Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv into TXT files for later use. Four files included in the folder of Generating and normalizing the subgraphs to generate subgraphs and then normalize them. The receptive_field.py serves as the main program, which calls the other three files. The stock_graph_indicator.py calculates topological structure data for subsequent use. Predictive_model.py takes normalized subgraphs and Y-values defined by contagion risk as inputs and performs parameter tuning to achieve optimal results.

  10. f

    Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paules, Richard S.; Ramaiahgari, Sreenivasa C.; Ferguson, Stephen S.; Auerbach, Scott S.; Bushel, Pierre R. (2020). Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000579048
    Explore at:
    Dataset updated
    Jun 23, 2020
    Authors
    Paules, Richard S.; Ramaiahgari, Sreenivasa C.; Ferguson, Stephen S.; Auerbach, Scott S.; Bushel, Pierre R.
    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001

Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc

Related Article
Explore at:
application/cdfv2Available download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

Search
Clear search
Close search
Google apps
Main menu