19 datasets found
  1. f

    LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time...

    • plos.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M. Zanger; Maria Thomas (2023). LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time PCR (qPCR) Data as an Alternative to Reference Gene Based Methods [Dataset]. http://doi.org/10.1371/journal.pone.0135852
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M. Zanger; Maria Thomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundGene expression analysis is an essential part of biological and medical investigations. Quantitative real-time PCR (qPCR) is characterized with excellent sensitivity, dynamic range, reproducibility and is still regarded to be the gold standard for quantifying transcripts abundance. Parallelization of qPCR such as by microfluidic Taqman Fluidigm Biomark Platform enables evaluation of multiple transcripts in samples treated under various conditions. Despite advanced technologies, correct evaluation of the measurements remains challenging. Most widely used methods for evaluating or calculating gene expression data include geNorm and ΔΔCt, respectively. They rely on one or several stable reference genes (RGs) for normalization, thus potentially causing biased results. We therefore applied multivariable regression with a tailored error model to overcome the necessity of stable RGs.ResultsWe developed a RG independent data normalization approach based on a tailored linear error model for parallel qPCR data, called LEMming. It uses the assumption that the mean Ct values within samples of similarly treated groups are equal. Performance of LEMming was evaluated in three data sets with different stability patterns of RGs and compared to the results of geNorm normalization. Data set 1 showed that both methods gave similar results if stable RGs are available. Data set 2 included RGs which are stable according to geNorm criteria, but became differentially expressed in normalized data evaluated by a t-test. geNorm-normalized data showed an effect of a shifted mean per gene per condition whereas LEMming-normalized data did not. Comparing the decrease of standard deviation from raw data to geNorm and to LEMming, the latter was superior. In data set 3 according to geNorm calculated average expression stability and pairwise variation, stable RGs were available, but t-tests of raw data contradicted this. Normalization with RGs resulted in distorted data contradicting literature, while LEMming normalized data did not.ConclusionsIf RGs are coexpressed but are not independent of the experimental conditions the stability criteria based on inter- and intragroup variation fail. The linear error model developed, LEMming, overcomes the dependency of using RGs for parallel qPCR measurements, besides resolving biases of both technical and biological nature in qPCR. However, to distinguish systematic errors per treated group from a global treatment effect an additional measurement is needed. Quantification of total cDNA content per sample helps to identify systematic errors.

  2. Normalization techniques for PARAFAC modeling of urine metabolomics data

    • data.niaid.nih.gov
    xml
    Updated May 11, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radana Karlikova (2017). Normalization techniques for PARAFAC modeling of urine metabolomics data [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls290
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 11, 2017
    Dataset provided by
    IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Hnevotinska 5, 775 15 Olomouc, Czech Republic
    Authors
    Radana Karlikova
    Variables measured
    Sample type, Metabolomics, Sample collection time
    Description

    One of the body fluids often used in metabolomics studies is urine. The peak intensities of metabolites in urine are affected by the urine history of an individual resulting in dilution differences. This requires therefore normalization of the data to correct for such differences. Two normalization techniques are commonly applied to urine samples prior to their further statistical analysis. First, AUC normalization aims to normalize a group of signals with peaks by standardizing the area under the curve (AUC) within a sample to the median, mean or any other proper representation of the amount of dilution. The second approach uses specific end-product metabolites such as creatinine and all intensities within a sample are expressed relative to the creatinine intensity. Another way of looking at urine metabolomics data is by realizing that the ratios between peak intensities are the information-carrying features. This opens up possibilities to use another class of data analysis techniques designed to deal with such ratios: compositional data analysis. In this approach special transformations are defined to deal with the ratio problem. In essence, it comes down to using another distance measure than the Euclidian Distance that is used in the conventional analysis of metabolomics data. We will illustrate using this type of approach in combination with three-way methods (i.e. PARAFAC) to be used in cases where samples of some biological material are measured at multiple time points. Aim of the paper is to develop PARAFAC modeling of three-way metabolomics data in the context of compositional data and compare this with standard normalization techniques for the specific case of urine metabolomics data.

  3. f

    Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xlsx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_1_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.XLSX [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  4. Naturalistic Neuroimaging Database

    • openneuro.org
    Updated Apr 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v2.0.0
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.
    • The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

    v2.0 Changes

    • Overview
      • We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.
    • Normalization

      • Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:
      # Generate a resting state (rs) timeseries (ts)
      # Install / load package to make fake fMRI ts
      # install.packages("neuRosim")
      library(neuRosim)
      # Generate a ts
      ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
      # 3dDetrend -normalize
      # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
      # Do for the full timeseries
      ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
      # Do this again for a shorter version of the same timeseries
      ts.shorter.length <- length(ts.normalised.long)/4
      ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
      # By looking at the summaries, it can be seen that the median values become  larger
      summary(ts.normalised.long)
      summary(ts.normalised.short)
      # Plot results for the long and short ts
      # Truncate the longer ts for plotting only
      ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
      # Give the plot a title
      title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
      plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
      # Add zero line
      lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
      # 3dDetrend -normalize -polort 0 for long timeseries
      lines(ts.normalised.long.made.shorter, col='blue');
      # 3dDetrend -normalize -polort 0 for short timeseries
      lines(ts.normalised.short, col='red');
      
    • Standardization/modernization

      • The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.
    • New afni_proc.py command line

      • The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

      We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

      Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

    • Effect on results

      • From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
  5. Data and models for "Center-fixing of tropical cyclones using...

    • zenodo.org
    nc, tar
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Lagerquist; Ryan Lagerquist (2025). Data and models for "Center-fixing of tropical cyclones using uncertainty-aware deep learning applied to high-temporal-resolution geostationary satellite imagery" by Lagerquist et al. [Dataset]. http://doi.org/10.5281/zenodo.15116855
    Explore at:
    tar, ncAvailable download formats
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryan Lagerquist; Ryan Lagerquist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file geocenter_models.tar contains all models comprising the GeoCenter ensemble: 3 convolutional neural networks (CNN), 3 isotonic-regression files (one for correcting each CNN’s mean estimate), and 3 more isotonic-regression files (one for correcting each CNN’s ensemble spread). Every model is found in a subdirectory whose names indicate which infrared (IR) wavelengths are used as input to the CNN. For example:

    • wavelengths-microns=3.900-7.340-13.300/model.weights.h5: An HDF5 file containing the trained CNN that uses data from bands 7, 10, 16 (corresponding to 3.9, 7.34, and 13.3 microns on the GOES ABI imager). The trained CNN can always be read by neural_net_utils.read_model() in the ml4tccf library (https://doi.org/10.5281/zenodo.15116854).

    • wavelengths-microns=3.900-7.340-13.300/model_metadata.p: A Pickle file containing metadata for the trained CNN. This file is needed to read the CNN itself with neural_net_utils.read_model(). Otherwise, you will probably never need to access this metafile directly.

    • wavelengths-microns=3.900-7.340-13.300/isotonic_regression/isotonic_regression.dill: A Dill file containing isotonic-regression models used to bias-correct the ensemble mean from the same CNN. The trained isotonic-regression models can always be read by scalar_isotonic_regression.read_file() in the ml4tccf library. Note that there are technically two isotonic-regression models for every CNN’s ensemble mean: one that bias-corrects the x-coordinate of the TC-center, another that bias-corrects the y-coordinate.

    • wavelengths-microns=3.900-7.340-13.300/uncertainty_calibration/uncertainty_calibration.dill: A Dill file containing isotonic-regression models used to bias-correct the ensemble spread from the same CNN. In the ml4tccf code, I make a distinction between “isotonic_regression” (correcting the ensemble mean) and “uncertainty_calibration” (correcting the ensemble spread), but note that both models are isotonic regression and use the sklearn.isotonic.IsotonicRegression class. The trained uncertainty-calibration models can always be read by scalar_uncertainty_calibration.read_file() in the ml4tccf library. Again, note that there are technically two uncertainty-calibration models per CNN: one for spread in the x-coordinate, one for spread in the y-coordinate.

    As mentioned above, every trained CNN can be read by neural_net_utils.read_model(). Also, every trained CNN can be applied to new data (inference mode) by neural_net_utils.apply_model(). The input argument model_object should be the object returned by neural_net_utils.read_model(), and I suggest setting num_examples_per_batch = 10 to avoid out-of-memory errors. The only other input argument is predictor_matrices, which is a list of two numpy arrays. The first numpy array contains IR imagery centered at the first-guess TC center, and the second numpy array contains ATCF scalars. The first numpy array should have dimensions S (number of TC samples) x 300 (grid rows) x 300 (grid columns) x 9 (lag times) x 3 (wavelengths). Lag times should be in the following order: 240, 210, 180, 150, 120, 90, 60, 30, 0 min ago. Wavelengths should be in the order indicated by the subdirectory name. The numpy array itself should contain normalized brightness temperatures at the given lag times and wavelengths, following the grid specifications laid out in the journal paper (a plate carrée grid with 2-km spacing). The original IR data (brightness temperatures) must be normalized to z-scores using the same normalization parameters as in the journal paper, i.e., those based on the training data. See details below. The second numpy array in predictor_matrices should have dimensions S (number of TC samples) x 9 (variables). The variables must in the order: absolute latitude, cosine of longitude, sine of longitude, TC intensity, minimum central pressure, tropical flag, subtropical flag, extratropical flag, disturbance flag. The journal paper contains details on all these variables in one table. These variables must come from A-deck files at the second-most recent synoptic time. Like the IR data, these ATCF scalars must be normalized to z-scores using the same normalization parameters as in the journal paper. See details below.

    Once you have predictions (estimated TC-center locations) from a CNN, you can bias-correct these predictions. To read the isotonic-regression model for the given CNN’s ensemble mean, use scalar_isotonic_regression.read_file() in the ml4tccf library. To apply the same model, use scalar_isotonic_regression.apply_models(). For the CNN’s ensemble spread, use scalar_uncertainty_calibration.read_file() and scalar_uncertainty_calibration.apply_models().

    To normalize the IR data, you will need the file ir_satellite_normalization_params.tar included with this dataset. Within the tar file is a single zarr file. You can read the zarr file with normalization.read_file() in the ml4tccf library; then you can normalize new data with normalization.normalize_data().

    To normalize the ATCF data, you will need the file a_deck_normalization_params.nc included with this dataset. This is a NetCDF file, containing the full set of training values for all 5 ATCF variables that are normalized (the binary storm-type flags are not normalized). You can read this file using any of the standard Python methods for reading NetCDF files, such as xarray.open_dataset(). To normalize new ATCF data, you can use the method normalization._normalize_one_variable(), where the argument actual_values_training is the list of training values from a_deck_normalization_params.nc for the given variable, while actual_values_new is the list of values to be normalized (currently in physical units, to be converted to z-score units).

  6. Z

    Data from: Citation data of arXiv eprints and the associated...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hitoshi Koshiba (2024). Citation data of arXiv eprints and the associated quantitatively-and-temporally normalised impact metrics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5803961
    Explore at:
    Dataset updated
    Jan 7, 2024
    Dataset provided by
    Keisuke Okamura
    Hitoshi Koshiba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data collection

    This dataset contains information on the eprints posted on arXiv from its launch in 1991 until the end of 2019 (1,589,006 unique eprints), plus the data on their citations and the associated impact metrics. Here, eprints include preprints, conference proceedings, book chapters, data sets and commentary, i.e. every electronic material that has been posted on arXiv.

    The content and metadata of the arXiv eprints were retrieved from the arXiv API (https://arxiv.org/help/api/) as of 21st January 2020, where the metadata included data of the eprint’s title, author, abstract, subject category and the arXiv ID (the arXiv’s original eprint identifier). In addition, the associated citation data were derived from the Semantic Scholar API (https://api.semanticscholar.org/) from 24th January 2020 to 7th February 2020, containing the citation information in and out of the arXiv eprints and their published versions (if applicable). Here, whether an eprint has been published in a journal or other means is assumed to be inferrable, albeit indirectly, from the status of the digital object identifier (DOI) assignment. It is also assumed that if an arXiv eprint received cpre and cpub citations until the data retrieval date (7th February 2020) before and after it is assigned a DOI, respectively, then the citation count of this eprint is recorded in the Semantic Scholar dataset as cpre + cpub. Both the arXiv API and the Semantic Scholar datasets contained the arXiv ID as metadata, which served as a key variable to merge the two datasets.

    The classification of research disciplines is based on that described in the arXiv.org website (https://arxiv.org/help/stats/2020_by_area/). There, the arXiv subject categories are aggregated into several disciplines, of which we restrict our attention to the following six disciplines: Astrophysics (‘astro-ph’), Computer Science (‘comp-sci’), Condensed Matter Physics (‘cond-mat’), High Energy Physics (‘hep’), Mathematics (‘math’) and Other Physics (‘oth-phys’), which collectively accounted for 98% of all the eprints. Those eprints tagged to multiple arXiv disciplines were counted independently for each discipline. Due to this overlapping feature, the current dataset contains a cumulative total of 2,011,216 eprints.

    Some general statistics and visualisations per research discipline are provided in the original article (Okamura, to appear), where the validity and limitations associated with the dataset are also discussed.

    Description of columns (variables)

    arxiv_id : arXiv ID

    category : Research discipline

    pre_year : Year of posting v1 on arXiv

    pub_year : Year of DOI acquisition

    c_tot : No. of citations acquired during 1991–2019

    c_pre : No. of citations acquired before and including the year of DOI acquisition

    c_pub : No. of citations acquired after the year of DOI acquisition

    c_yyyy (yyyy = 1991, …, 2019) : No. of citations acquired in the year yyyy (with ‘yyyy’ running from 1991 to 2019)

    gamma : The quantitatively-and-temporally normalised citation index

    gamma_star : The quantitatively-and-temporally standardised citation index

    Note: The definition of the quantitatively-and-temporally normalised citation index (γ; ‘gamma’) and that of the standardised citation index (γ*; ‘gamma_star’) are provided in the original article (Okamura, to appear). Both indices can be used to compare the citational impact of papers/eprints published in different research disciplines at different times.

    Data files

    A comma-separated values file (‘arXiv_impact.csv’) and a Stata file (‘arXiv_impact.dta’) are provided, both containing the same information.

  7. RNAdecayCafe: a uniformly reprocessed atlas of human RNA half-lives across...

    • zenodo.org
    application/gzip, bin +1
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Vock; Isaac Vock (2025). RNAdecayCafe: a uniformly reprocessed atlas of human RNA half-lives across 12 cell lines [Dataset]. http://doi.org/10.5281/zenodo.15785218
    Explore at:
    csv, bin, application/gzipAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Isaac Vock; Isaac Vock
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RNA half-life estimates from uniformly reprocessed/reanalyzed, published, high quality nucleotide recoding RNA-seq (NR-seq; namely SLAM-seq and TimeLapse-seq) datasets. 12 human cell lines are represented. Data can be browsed at this website.

    Analysis notes:

    • Data was processed using fastq2EZbakR. All config files used are provided in fastq2EZbakR_config.tar.gz (as well as some for data not included in the final RNAdecayCafe due to QC issues). Some general notes:
      • Multi-mapping reads were filtered out completely. It is difficult to do anything accurate/intelligent with such reads (see discussion here for instance), so better to just get rid of these completely. Does mean that some classes of features rich in repetitive sequences will be underrepresented in this database.
      • Adapters were trimmed. For 3'-end data, 12 additional nucleotides were trimmed from the 5' end of the reads, as suggested by the developers of the popular Quant-seq kit, and as is done by default in SLAMDUNK. Quality score end trimming and polyX trimming is also done for all samples.
    • Data was analyzed using EZbakR. Some general notes:
      • If -s4U data was included, this data was used to infer a global pold to stabilize pnew estimation (see Methods here for brief discussion). Dropout was also corrected using a previously devleoped strategy now implemented in EZbakR's CorrectDropout() function.
      • Half-lives are estimated on a gene-level. That is, all reads that map to exonic regions of a gene (i.e., regions that are exonic in at least one annotated isoform) are combined and used to estimate a half-life for that gene. Thus, you should think of these half-life estimates as a weighted average over all isoforms expressed from that gene, weighted by the relative abundances of those isoforms. Future releases may include isoform-resolution estimates as well, given that EZbakR can now perform this type of analysis.
    • A unique feature of RNAdecayCafe is that it includes what I am referring to as "dropout normalized" half-life and kdeg estimates. As not all datasets analyzed include -s4U data, dropout correction is not possible for all samples. This can lead to global biases in the average time scale of half-lives that is unlikely to represent real biology (that is, two different K562 datasets may have median half-life estimates of 4 hours and 15 hours). To address this problem and faciltiate comparison across datasets, I developed a strategy (implemented in EZbakR's NormalizeForDropout() function) that uses a model of dropout to normalize estimates with respect to a low dropout sample.
      • These "donorm" estimates will often be a more accurate reflection of rate constants and half-lives in a given cell line.
      • This strategy can normalize out real global differences in turnover kinetics though, so interpret these values with care.

    Relevant data provided in this repository are as follows:

    1. hg38_Strict.gtf: annotation used for analysis. Filtered similarly to how is described here.
    2. AvgKdegs_genes_v1.csv: Table of cell-line average half-lives and degradation rate constants (kdegs). Average log(kdeg)'s are calculated for all samples from a given cell line, weighting by the uncertainty in the log(kdeg) estimate. Columns in this table are as follows:
      1. feature_ID: Gene ID (symbol) from hg38_Strict.gtf
      2. cell_line: Human cell line for which averages are calculated
      3. avg_log_kdeg: Weighted log(kdeg) average
      4. avg_donorm_log_kdeg: Weighted dropout normalized log(kdeg) average.
      5. avg_log_RPKM_total: Average log(RPKM) value from total RNA data. A value of exactly 0 means that there was no total RNA data for this cell line (i.e., all data was 3'-end data).
      6. avg_log_RPKM_3pend: Average log(RPKM) value from 3'-end data. Technically no length normalization is performed as this is 3'-end data, so it is really an log(RPM). A value of exactly 0 means that there was no 3'-end data for this cell line (i.e., all data was for total RNA).
      7. avg_kdeg: e^avg_log_kdeg
      8. avg_donorm_kdeg: e^avg_donorm_log_kdeg
      9. avg_halflife: log(2)/avg_kdeg; can be thought of as average lifetime of the RNA.
      10. avg_donorm_halflife: log(2)/avg_donorm_kdeg
      11. avg_RPKM_total: e^avg_log_RPKM_total
      12. avg_RPKM_3pend: e^avg_log_RPKM_3pend.
    3. FeatureDetails_gene_v1.csv: Table of details about each gene measured; information comes from hg38_Strict.gtf and the corresponding hg38 genome FASTA file.
      1. seqnames: chromosome name
      2. strand: strand on which gene is transcribed
      3. start: genomic start position for gene (most 5'-end coordinate; will be location of TES for - strand genes).
      4. end: end position for gene
      5. type: all "gene" for now, as all analyses are currently gene-level average half-life calculations
      6. exon_length: length of union of exons for a given gene. A read is considered exonic, and thus used for half-life estimation, if it exclusively overlaps with the region defined by the union of all annotated exons for that gene.
      7. exon_GC_fraction: fraction of nucleotides in union of exons that are Gs or Cs.
      8. end_GC_fraction: fraction of nucleotides in last 1000 nts of 3'end of transcript that are Gs or Cs. Useful for assessing GC biases in 3'-end data.
      9. feature_ID: Gene ID (symbol) from hg38_Strict.gtf
    4. SampleDetails_v1.csv: Table of details about all samples represented in RNAdecayCafe
      1. sample: SRA accession ID for sample
      2. dataset: Citation-esque summary of the dataset of origin
      3. pnew: EZbakR estimated T-to-C mutation rate in reads from new (labeled) RNA. You can see these blogs (here and here) for some intuition as to how to interpret these. More technical explanations of the models involved can be found here and here.
      4. pold: EZbakR estimated T-to-C mutation rate in reads from old (unlabeled) RNA. Same citations for pnew apply. Best samples are those with the largest gap between the pold and pnew; can think of this like a signal-to-noise ratio
      5. label_time: How long (in hours) were cells labeled with s4U for?
      6. cell_line: Cell line used for that sample.
      7. threePseq: TRUE or FALSE; TRUE if 3'-end sequencing was used.
      8. total_reads: Total number of aligned, exonic reads in the sample.
      9. median_halflife: Median, unnormalized half-life estimate. Differences between cell lines could represent real biology, but could also be evidence of dropout (see here and here for discussion of this phenomenon).
    5. RateConstants_gene_v1.csv
      1. sample: SRA accession ID for sample.
      2. kdeg: e ^ log_kdeg
      3. halflife: log(2) / kdeg. Can be thought of as the average lifetime of the RNA.
      4. donorm_kdeg: e ^ donorm_log_kdeg
      5. donorm_halflife: log(2) / donorm_kdeg
      6. log_kdeg: log degradation rate constant estimated by EZbakR
      7. donorm_log_kdeg: dropout normalized log degradation rate constant
      8. reads: number of reads that contributed to estimates
      9. donorm_reads: dropout normalization corrected read count
      10. feature_ID: Gene ID (symbol) from hg38_Strict.gtf
    6. RNAdecayCafe_database_v1.rds: compressed RDS file that stores a list containing the above 4 tables in the following entries:
      1. kdegs = RateConstants_gene_v1.csv
      2. sample_metadata = SampleDetails_v1.csv
      3. feature_metadata = FeatureDetails_v1.csv
      4. average_kdegs = AvgKdegs_gene_v1.csv
    7. RNAdecayCafe_v1_onetable.csv: Inner joining of all but the averages table in this database. Thus, is one mega table containing all sample-specific estimates, sample metadata, and feature information.

    Datasets included:

    1. Finkel et al. 2021 (Calu3 cells; PMID: 35313595)
    2. Harada et al. 2022 (MV411 cells; PMID: 35301220)
    3. Ietswaart et al. 2024 (K562 cells; PMID: 38964322); whole-cell data used
    4. Luo et al. 2020 (HEK293T cells; PMID: 33357462)
    5. Mabin et al. 2025 (HEK293T cells; PMID: 40161772); only dataset for which data is not yet publicly

  8. Normalized Water Quality Data

    • figshare.com
    txt
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dov Stekel (2022). Normalized Water Quality Data [Dataset]. http://doi.org/10.6084/m9.figshare.19213386.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 22, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Dov Stekel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Water quality data. These data have been normalised to their means over the time period with a normalised mean of 100.

  9. Z

    Assessing the impact of hints in learning formal specification: Research...

    • data.niaid.nih.gov
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margolis, Iara (2024). Assessing the impact of hints in learning formal specification: Research artifact [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10450608
    Explore at:
    Dataset updated
    Jan 29, 2024
    Dataset provided by
    Cunha, Alcino
    Campos, José Creissac
    Margolis, Iara
    Macedo, Nuno
    Sousa, Emanuel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.

    Dataset

    The artifact contains the resources described below.

    Experiment resources

    The resources needed for replicating the experiment, namely in directory experiment:

    alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.

    alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.

    docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.

    api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.

    Experiment data

    The task database used in our application of the experiment, namely in directory data/experiment:

    Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.

    identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.

    Collected data

    Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:

    data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).

    data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:

    participant identification: participant's unique identifier (ID);

    socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).

    data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:

    participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);

    detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.

    data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:

    participant identification: participant's unique identifier (ID);

    user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).

    participants.txt: the list of participant identifiers that have registered for the experiment.

    Analysis scripts

    The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:

    analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.

    requirements.r: An R script to install the required libraries for the analysis script.

    normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.

    normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.

    Dockerfile: Docker script to automate the analysis script from the collected data.

    Setup

    To replicate the experiment and the analysis of the results, only Docker is required.

    If you wish to manually replicate the experiment and collect your own data, you'll need to install:

    A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.

    If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:

    Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.

    R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.

    Usage

    Experiment replication

    This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.

    To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.

    cd experimentdocker-compose up

    This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.

    In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:

    Group N (no hints): http://localhost:3000/0CAN

    Group L (error locations): http://localhost:3000/CA0L

    Group E (counter-example): http://localhost:3000/350E

    Group D (error description): http://localhost:3000/27AD

    In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.

    Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.

    Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.

    After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:

    Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.

    Analysis of other applications of the experiment

    This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.

    The analysis script expects data in 4 CSV files,

  10. f

    DDSP EMG dataset.xlsx

    • figshare.com
    • commons.datacite.org
    xlsx
    Updated Jul 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    marta Cercone (2019). DDSP EMG dataset.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.8864411.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 14, 2019
    Dataset provided by
    figshare
    Authors
    marta Cercone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study was performed in accordance with the PHS Policy on Humane Care and Use of Laboratory Animals, federal and state regulations, and was approved by the Institutional Animal Care and Use Committees (IACUC) of Cornell University and the Ethics and Welfare Committee at the Royal Veterinary College.Study design: adult horses were recruited if in good health and following evaluation of the upper airways through endoscopic exam, at rest and during exercise, either overground or on a high-speed treadmill using a wireless videoendoscope. Horses were categorized as “DDSP” affected horses if they presented with exercise-induced intermittent dorsal displacement of the soft palate consistently during multiple (n=3) exercise tests, or “control” horses if they did not experience dorsal displacement of the soft palate during exercise and had no signs compatible with DDSP like palatal instability during exercise, soft palate or sub-epiglottic ulcerations. Horses were instrumented with intramuscular electrodes, in one or both thyro-hyoid muscles for EMG recording, hard wired to a wireless transmitter for remote recording implanted in the cervical area. EMG recordings were then made during an incremental exercise test based on the percentage of maximum heart rate (HRmax). Incremental Exercise Test After surgical instrumentation, each horse performed a 4-step incremental test while recording TH electromyographic activity, heart rate, upper airway videoendoscopy, pharyngeal airway pressures, and gait frequency measurements. Horses were evaluated at exercise intensities corresponding to 50, 80, 90 and 100% of their maximum heart rate with each speed maintained for 1 minute. aryngeal function during the incremental test was recorded using a wireless videoendoscope (Optomed, Les Ulis, France), which was placed into the nasopharynx via the right ventral nasal meatus. Nasopharyngeal pressure was measured using a Teflon catheter (1.3 mm ID, Neoflon) inserted through the left ventral nasal meatus to the level of the left guttural pouch ostium. The catheter was attached to differential pressure transducers (Celesco LCVR, Celesco Transducers Products, Canoga Park, CA, USA) referenced to atmospheric pressure and calibrated from -70 to 70 mmHg. Occurrence of episodes of dorsal displacement of the soft palate was recorded and number of swallows during each exercise trials were counted for each speed interval. EMG recordingEMG data was recorded through a wireless transmitter device implanted subcutaneously. Two different transmitters were used: 1) TR70BB (Telemetry Research Ltd, Auckland, New Zealand) with 12bit A/D conversion resolution, AC coupled amplifier, -3dB point at 1.5Hz, 2KHz sampling frequency (n=5 horses); or 2) ELI (Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria) [23], with 12bit A/D conversion resolution, AC coupled amplifier, amplifier gain 1450, 1KHz sampling frequency (n=4 horses). The EMG signal was transmitted through a receiver (TR70BB) or Bluetooth (ELI) to a data acquisition system (PowerLab 16/30 - ML880/P, ADInstruments, Bella Vista, Australia). The EMG signal was amplified with octal bio-amplifier (Octal Bioamp, ML138, ADInstruments, Bella Vista, Australia) with a bandwidth frequency ranging from 20-1000 Hz (input impedance = 200 MV, common mode rejection ratio = 85 dB, gain = 1000), and transmitted to a personal computer. All EMG and pharyngeal pressure signals were collected at 2000 Hz rate with LabChart 6 software (ADInstruments, Bella Vista, Australia) that allows for real-time monitoring and storage for post-processing and analysis.EMG signal processingElectromyographic signals from the TH muscles were processed using two methods; 1) a classical approach to myoelectrical activity and median frequency and 2) wavelet decomposition. For both methods, the beginning and end of recording segments including twenty consecutive breaths, at the end of each speed interval, were marked with comments in the acquisition software (LabChart). The relationship of EMG activity with phase of the respiratory cycle was determined by comparing pharyngeal pressure waveforms with the raw EMG and time-averaged EMG traces.For the classical approach, in a graphical user interface-based software (LabChart), a sixth-order Butterworth filter was applied (common mode rejection ratio, 90 dB; band pass, 20 to 1,000 Hz), the EMG signal was then amplified, full-wave rectified, and smoothed using a triangular Bartlett window (time constant: 150ms). The digitized area under the time-averaged full-wave rectified EMG signal was calculated to define the raw mean electrical activity (MEA) in mV.s. Median Power Frequency (MF) of the EMG power spectrum was calculated after a Fast Fourier Transformation (1024 points, Hann cosine window processing). For the wavelet decomposition, the whole dataset including comments and comment locations was exported as .mat files for processing in MATLAB R2018a with the Signal Processing Toolbox (The MathWorks Inc, Natick, MA, USA). A custom written automated script based on Hodson-Tole & Wakeling [24] was used to first cut the .mat file into the selected 20 breath segments and subsequently process each segment. A bank of 16 wavelets with time and frequency resolution optimized for EMG was used. The center frequencies of the bank ranged from 6.9 Hz to 804.2 Hz [25]. The intensity was summed (mV2) to a total, and the intensity contribution of each wavelet was calculated across all 20 breaths for each horse, with separate results for each trial date and exercise level (80, 90, 100% of HRmax as well as the period preceding episodes of DDSP). To determine the relevant bandwidths for the analysis, a Fast Fourier transform frequency analysis was performed on the horses unaffected by DDSP from 0 to 1000 Hz in increments of 50Hz and the contribution of each interval was calculated in percent of total spectrum as median and interquartile range. According to the Shannon-Nyquist sampling theorem, the relevant signal is below ½ the sample rate and because we had instrumentation sampling either 1000Hz and 2000Hz we choose to perform the frequency analysis up to 1000Hz. The 0-50Hz interval, mostly stride frequency and background noise, was excluded from further analysis. Of the remaining frequency spectrum, we included all intervals from 50-100Hz to 450-500Hz and excluded the remainder because they contributed with less than 5% to the total amplitude.Data analysisAt the end of each exercise speed interval, twenty consecutive breaths were selected and analyzed as described above. To standardize MEA, MF and mV2 within and between horses and trials, and to control for different electrodes size (i.e. different impedance and area of sampling), data were afterward normalized to 80% of HRmax value (HRmax80), referred to as normalized MEA (nMEA), normalized MF (nMF) and normalized mV2 (nmV2). During the initial processing, it became clear that the TH muscle is inconsistently activated at 50% of HRmax and that speed level was therefore excluded from further analysis. The endoscopy video was reviewed and episodes of palatal displacement were marked with comments. For both the classical approach and wavelet analysis, an EMG segment preceding and concurrent to the DDSP episode was analyzed. If multiple episodes were recorded during the same trial, only the period preceding the first palatal displacement was analyzed. In horses that had both TH muscles implanted, the average between the two sides was used for the analysis. Averaged data from multiple trials were considered for each horse. Descriptive data are expressed as means with standard deviation (SD). Normal distribution of data was assessed using the Kolmogorov-Smirnov test and quantile-quantile (Q-Q) plot. To determine the frequency clusters in the EMG signal, a hierarchical agglomerative dendrogram was applied using the packages Matplotlib, pandas, numpy and scipy in python (version 3.6.6) executed through Spyder (version 3.2.2) and Anaconda Navigator. Based on the frequency analysis, wavelets included in the cluster analysis were 92.4 Hz, 128.5 Hz, 170.4 Hz, 218.1 Hz, 271.5 Hz, 330.6 Hz, 395.4 Hz and 465.9 Hz. The number of frequency clusters was set to two based on maximum acceleration in a scree plot and maximum vertical distance in the dendrogram. For continuous outcome measures (number of swallows, MEA, MF, and mV2) a mixed effect model was fitted to the data to determine the relationship between the outcome variable and relevant fixed effects (breed, sex, age, weight, speed, group) using horse as a random effect. Tukey’s post hoc tests and linear contrasts used as appropriate. Statistical analysis was performed using JMP Pro13 (SAS Institute, Cary, NC, USA). Significance set at P < 0.05 throughout.

  11. CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Thilde Terkelsen; Anders Krogh; Elena Papaleo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.

  12. Left ventricular mass is underestimated in overweight children because of...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hubert Krysztofiak; Marcel Młyńczak; Łukasz A. Małek; Andrzej Folga; Wojciech Braksator (2023). Left ventricular mass is underestimated in overweight children because of incorrect body size variable chosen for normalization [Dataset]. http://doi.org/10.1371/journal.pone.0217637
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hubert Krysztofiak; Marcel Młyńczak; Łukasz A. Małek; Andrzej Folga; Wojciech Braksator
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundLeft ventricular mass normalization for body size is recommended, but a question remains: what is the best body size variable for this normalization—body surface area, height or lean body mass computed based on a predictive equation? Since body surface area and computed lean body mass are derivatives of body mass, normalizing for them may result in underestimation of left ventricular mass in overweight children. The aim of this study is to indicate which of the body size variables normalize left ventricular mass without underestimating it in overweight children.MethodsLeft ventricular mass assessed by echocardiography, height and body mass were collected for 464 healthy boys, 5–18 years old. Lean body mass and body surface area were calculated. Left ventricular mass z-scores computed based on reference data, developed for height, body surface area and lean body mass, were compared between overweight and non-overweight children. The next step was a comparison of paired samples of expected left ventricular mass, estimated for each normalizing variable based on two allometric equations—the first developed for overweight children, the second for children of normal body mass.ResultsThe mean of left ventricular mass z-scores is higher in overweight children compared to non-overweight children for normative data based on height (0.36 vs. 0.00) and lower for normative data based on body surface area (-0.64 vs. 0.00). Left ventricular mass estimated normalizing for height, based on the equation for overweight children, is higher in overweight children (128.12 vs. 118.40); however, masses estimated normalizing for body surface area and lean body mass, based on equations for overweight children, are lower in overweight children (109.71 vs. 122.08 and 118.46 vs. 120.56, respectively).ConclusionNormalization for body surface area and for computed lean body mass, but not for height, underestimates left ventricular mass in overweight children.

  13. Predictive Validity Data Set

    • figshare.com
    txt
    Updated Dec 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Abeyta (2022). Predictive Validity Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.17030021.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 18, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Antonio Abeyta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Verbal and Quantitative Reasoning GRE scores and percentiles were collected by querying the student database for the appropriate information. Any student records that were missing data such as GRE scores or grade point average were removed from the study before the data were analyzed. The GRE Scores of entering doctoral students from 2007-2012 were collected and analyzed. A total of 528 student records were reviewed. Ninety-six records were removed from the data because of a lack of GRE scores. Thirty-nine of these records belonged to MD/PhD applicants who were not required to take the GRE to be reviewed for admission. Fifty-seven more records were removed because they did not have an admissions committee score in the database. After 2011, the GRE’s scoring system was changed from a scale of 200-800 points per section to 130-170 points per section. As a result, 12 more records were removed because their scores were representative of the new scoring system and therefore were not able to be compared to the older scores based on raw score. After removal of these 96 records from our analyses, a total of 420 student records remained which included students that were currently enrolled, left the doctoral program without a degree, or left the doctoral program with an MS degree. To maintain consistency in the participants, we removed 100 additional records so that our analyses only considered students that had graduated with a doctoral degree. In addition, thirty-nine admissions scores were identified as outliers by statistical analysis software and removed for a final data set of 286 (see Outliers below). Outliers We used the automated ROUT method included in the PRISM software to test the data for the presence of outliers which could skew our data. The false discovery rate for outlier detection (Q) was set to 1%. After removing the 96 students without a GRE score, 432 students were reviewed for the presence of outliers. ROUT detected 39 outliers that were removed before statistical analysis was performed. Sample See detailed description in the Participants section. Linear regression analysis was used to examine potential trends between GRE scores, GRE percentiles, normalized admissions scores or GPA and outcomes between selected student groups. The D’Agostino & Pearson omnibus and Shapiro-Wilk normality tests were used to test for normality regarding outcomes in the sample. The Pearson correlation coefficient was calculated to determine the relationship between GRE scores, GRE percentiles, admissions scores or GPA (undergraduate and graduate) and time to degree. Candidacy exam results were divided into students who either passed or failed the exam. A Mann-Whitney test was then used to test for statistically significant differences between mean GRE scores, percentiles, and undergraduate GPA and candidacy exam results. Other variables were also observed such as gender, race, ethnicity, and citizenship status within the samples. Predictive Metrics. The input variables used in this study were GPA and scores and percentiles of applicants on both the Quantitative and Verbal Reasoning GRE sections. GRE scores and percentiles were examined to normalize variances that could occur between tests. Performance Metrics. The output variables used in the statistical analyses of each data set were either the amount of time it took for each student to earn their doctoral degree, or the student’s candidacy examination result.

  14. f

    PlotTwist: A web app for plotting and annotating continuous data

    • figshare.com
    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joachim Goedhart (2023). PlotTwist: A web app for plotting and annotating continuous data [Dataset]. http://doi.org/10.1371/journal.pbio.3000581
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Biology
    Authors
    Joachim Goedhart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Experimental data can broadly be divided in discrete or continuous data. Continuous data are obtained from measurements that are performed as a function of another quantitative variable, e.g., time, length, concentration, or wavelength. The results from these types of experiments are often used to generate plots that visualize the measured variable on a continuous, quantitative scale. To simplify state-of-the-art data visualization and annotation of data from such experiments, an open-source tool was created with R/shiny that does not require coding skills to operate it. The freely available web app accepts wide (spreadsheet) and tidy data and offers a range of options to normalize the data. The data from individual objects can be shown in 3 different ways: (1) lines with unique colors, (2) small multiples, and (3) heatmap-style display. Next to this, the mean can be displayed with a 95% confidence interval for the visual comparison of different conditions. Several color-blind-friendly palettes are available to label the data and/or statistics. The plots can be annotated with graphical features and/or text to indicate any perturbations that are relevant. All user-defined settings can be stored for reproducibility of the data visualization. The app is dubbed PlotTwist and runs locally or online: https://huygens.science.uva.nl/PlotTwist

  15. f

    De-identified data for use in analyses.

    • plos.figshare.com
    • figshare.com
    csv
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carly A. Busch; Margaret Barstow; Sara E. Brownell; Katelyn M. Cooper (2024). De-identified data for use in analyses. [Dataset]. http://doi.org/10.1371/journal.pmen.0000086.s008
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOS Mental Health
    Authors
    Carly A. Busch; Margaret Barstow; Sara E. Brownell; Katelyn M. Cooper
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Depression and anxiety are among the most common mental health concerns for science and engineering (S&E) undergraduates in the United States (U.S.), and students perceive they would benefit from knowing a S&E instructor with depression or anxiety. However, it is unknown how prevalent depression and anxiety are among S&E instructors and whether instructors disclose their depression or anxiety to their undergraduates. These identities are unique because they are concealable stigmatized identities (CSIs), meaning they can be kept hidden and carry negative stereotypes. To address these gaps, we surveyed 2013 S&E faculty instructors across U.S. very high research activity doctoral-granting institutions. The survey assessed the extent to which they had and revealed depression or anxiety to undergraduates, why they chose to reveal or conceal their depression or anxiety, and the benefits of revealing depression or anxiety. These items were developed based on prior studies exploring why individuals conceal or reveal CSIs including mental health conditions. Of the university S&E instructors surveyed, 23.9% (n = 482) reported having depression and 32.8% (n = 661) reported having anxiety. Instructors who are women, white, Millennials, or LGBTQ+ are more likely to report depression or anxiety than their counterparts. Very few participants revealed their depression (5.4%) or anxiety (8.3%) to undergraduates. Instructors reported concealing their depression and anxiety because they do not typically disclose to others or because it is not relevant to course content. Instructors anticipated that undergraduates would benefit from disclosure because it would normalize struggling with mental health and provide an example of someone with depression and anxiety who is successful in S&E. Despite undergraduates reporting a need for role models in academic S&E who struggle with mental health and depression/anxiety being relatively common among U.S. S&E instructors, our study found that instructors rarely reveal these identities to their undergraduates.

  16. f

    Data from: An automatic gain control circuit to improve ECG acquisition

    • scielo.figshare.com
    jpeg
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Rovetta; João Fernando Refosco Baggio; Raimes Moraes (2023). An automatic gain control circuit to improve ECG acquisition [Dataset]. http://doi.org/10.6084/m9.figshare.5668756.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    SciELO journals
    Authors
    Marco Rovetta; João Fernando Refosco Baggio; Raimes Moraes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Introduction Long-term electrocardiogram (ECG) recordings are widely employed to assist the diagnosis of cardiac and sleep disorders. However, variability of ECG amplitude during the recordings hampers the detection of QRS complexes by algorithms. This work presents a simple electronic circuit to automatically normalize the ECG amplitude, improving its sampling by analog to digital converters (ADCs). Methods The proposed circuit consists of an analog divider that normalizes the ECG amplitude using its absolute peak value as reference. The reference value is obtained by means of a full-wave rectifier and a peak voltage detector. The circuit and tasks of its different stages are described. Results Example of the circuit performance for a bradycardia ECG signal (40bpm) is presented; the signal has its amplitude suddenly halved, and later, restored. The signal is automatically normalized after 5 heart beats for the amplitude drop. For the amplitude increase, the signal is promptly normalized. Conclusion The proposed circuit adjusts the ECG amplitude to the input voltage range of ADC, avoiding signal to noise ratio degradation of the sampled waveform in order to allow a better performance of processing algorithms.

  17. Nanostring expression assay for positionally conserved lncRNAs

    • figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso Leonardi; Paulo P Amaral; Anton Enright; Tony Kouzarides (2023). Nanostring expression assay for positionally conserved lncRNAs [Dataset]. http://doi.org/10.6084/m9.figshare.5853777.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Tommaso Leonardi; Paulo P Amaral; Anton Enright; Tony Kouzarides
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains expression data for positionally conserved lncRNAs measure using the Nanostring assay. Sheet A: Annotation of probes used in the assay. B: List of samples tested in the assay. C: Normalised expression of tested human and mouse pcRNAs and associated coding genes.MethodsWe designed probes to detect 50 pairs of pcRNAs and corresponding coding genes in human and mouse. The probes were designed according to the Nanostring guidelines and to maximize their specificity and included 9 house-keeping genes for normalization (ALAS1, B2M, CLTC, GAPDH, GUSB, HPRT, PGK1, TDB, TUBB). The raw count data were first normalized by Nanostring Technologies with the nSolver software using a two-step protocol. First, data were normalized to internal positive controls, then to the geometric mean of house-keeping genes. The normalised data was then imported into R for further analysis.

  18. f

    Comparison of the expression of each reference gene in HT and non-diseased...

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Strauss; Håvard Mikkelsen; Jessica Furriol (2023). Comparison of the expression of each reference gene in HT and non-diseased control biopsies. [Dataset]. http://doi.org/10.1371/journal.pone.0259373.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Philipp Strauss; Håvard Mikkelsen; Jessica Furriol
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    3a displays the pvalues from the qPCR experiments. Data were analyzed by Mann-Whitney Asymp. Sig. (2-tailed). None of the comparisons yielded statistically significant results. 3b shows the fold change (FC) differences and Pearson’s correlation in the expression of selected references genes. Fold changes are represented as mean±SD log FC.

  19. Descriptive statistics for the normalised path length lambda...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanneke de Waal; Cornelis J. Stam; Marieke M. Lansbergen; Rico L. Wieggers; Patrick J. G. H. Kamphuis; Philip Scheltens; Fernando Maestú; Elisabeth C. W. van Straaten (2023). Descriptive statistics for the normalised path length lambda (intent-to-treat population). [Dataset]. http://doi.org/10.1371/journal.pone.0086558.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hanneke de Waal; Cornelis J. Stam; Marieke M. Lansbergen; Rico L. Wieggers; Patrick J. G. H. Kamphuis; Philip Scheltens; Fernando Maestú; Elisabeth C. W. van Straaten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note. Data are means (standard deviation).1P-values were based on a mixed model for repeated measures (2 degrees of freedom contrast) with post-baseline measurements as an outcome.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M. Zanger; Maria Thomas (2023). LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time PCR (qPCR) Data as an Alternative to Reference Gene Based Methods [Dataset]. http://doi.org/10.1371/journal.pone.0135852

LEMming: A Linear Error Model to Normalize Parallel Quantitative Real-Time PCR (qPCR) Data as an Alternative to Reference Gene Based Methods

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Ronny Feuer; Sebastian Vlaic; Janine Arlt; Oliver Sawodny; Uta Dahmen; Ulrich M. Zanger; Maria Thomas
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundGene expression analysis is an essential part of biological and medical investigations. Quantitative real-time PCR (qPCR) is characterized with excellent sensitivity, dynamic range, reproducibility and is still regarded to be the gold standard for quantifying transcripts abundance. Parallelization of qPCR such as by microfluidic Taqman Fluidigm Biomark Platform enables evaluation of multiple transcripts in samples treated under various conditions. Despite advanced technologies, correct evaluation of the measurements remains challenging. Most widely used methods for evaluating or calculating gene expression data include geNorm and ΔΔCt, respectively. They rely on one or several stable reference genes (RGs) for normalization, thus potentially causing biased results. We therefore applied multivariable regression with a tailored error model to overcome the necessity of stable RGs.ResultsWe developed a RG independent data normalization approach based on a tailored linear error model for parallel qPCR data, called LEMming. It uses the assumption that the mean Ct values within samples of similarly treated groups are equal. Performance of LEMming was evaluated in three data sets with different stability patterns of RGs and compared to the results of geNorm normalization. Data set 1 showed that both methods gave similar results if stable RGs are available. Data set 2 included RGs which are stable according to geNorm criteria, but became differentially expressed in normalized data evaluated by a t-test. geNorm-normalized data showed an effect of a shifted mean per gene per condition whereas LEMming-normalized data did not. Comparing the decrease of standard deviation from raw data to geNorm and to LEMming, the latter was superior. In data set 3 according to geNorm calculated average expression stability and pairwise variation, stable RGs were available, but t-tests of raw data contradicted this. Normalization with RGs resulted in distorted data contradicting literature, while LEMming normalized data did not.ConclusionsIf RGs are coexpressed but are not independent of the experimental conditions the stability criteria based on inter- and intragroup variation fail. The linear error model developed, LEMming, overcomes the dependency of using RGs for parallel qPCR measurements, besides resolving biases of both technical and biological nature in qPCR. However, to distinguish systematic errors per treated group from a global treatment effect an additional measurement is needed. Quantification of total cDNA content per sample helps to identify systematic errors.

Search
Clear search
Close search
Google apps
Main menu