Facebook
TwitterMetagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reference genes used in normalizing qRT-PCR data are critical for the accuracy of gene expression analysis. However, many traditional reference genes used in zebrafish early development are not appropriate because of their variable expression levels during embryogenesis. In the present study, we used our previous RNA-Seq dataset to identify novel reference genes suitable for gene expression analysis during zebrafish early developmental stages. We first selected 197 most stably expressed genes from an RNA-Seq dataset (29,291 genes in total), according to the ratio of their maximum to minimum RPKM values. Among the 197 genes, 4 genes with moderate expression levels and the least variation throughout 9 developmental stages were identified as candidate reference genes. Using four independent statistical algorithms (delta-CT, geNorm, BestKeeper and NormFinder), the stability of qRT-PCR expression of these candidates was then evaluated and compared to that of actb1 and actb2, two commonly used zebrafish reference genes. Stability rankings showed that two genes, namely mobk13 (mob4) and lsm12b, were more stable than actb1 and actb2 in most cases. To further test the suitability of mobk13 and lsm12b as novel reference genes, they were used to normalize three well-studied target genes. The results showed that mobk13 and lsm12b were more suitable than actb1 and actb2 with respect to zebrafish early development. We recommend mobk13 and lsm12b as new optimal reference genes for zebrafish qRT-PCR analysis during embryogenesis and early larval stages.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The tar file contains two directories: data and models. Within "data," there are 4 subdirectories: "training" (the clean training data -- without perturbations), "training_all_perturbed_for_uq" (the lightly perturbed training data), "validation_all_perturbed_for_uq" (the moderately perturbed validation data), and "testing_all_perturbed_for_uq" (the heavily perturbed validation data). The data in these directories are unnormalized. The subdirectories "training" and "training_all_perturbed_for_uq" each contain a normalization file. These normalization files contain parameters used to normalize the data (from physical units to z-scores) for Experiment 1 and Experiment 2, respectively. To do the normalization, you can use the script normalize_examples.py in the code library (ml4rt) with the argument input_normalization_file_name set to one of these two file paths. The other arguments should be as follows:
--uniformize=1
--predictor_norm_type_string="z_score"
--vector_target_norm_type_string=""
--scalar_target_norm_type_string=""
Within the directory "models," there are 6 subdirectories: for the BNN-only models trained with clean and lightly perturbed data, for the CRPS-only models trained with clean and lightly perturbed data, and for the BNN/CRPS models trained with clean and lightly perturbed data. To read the models into Python, you can use the method neural_net.read_model in the ml4rt library.
Facebook
TwitterTHE FOLLOWING COMMENTS ARE TAKEN FROM THE PI N COMPILATION OF R.L. KELLY. THEY ARE THAT COMPILATION& apos;S COMPLETE SET OF COMMENTS FOR PAPERS RELATED TO THE SAME EXPERIMENT (DESIGNATED BUSZA69) AS THE CURRENT PAPER. (THE IDENTIFIER PRECEDING THE REFERENCE AND COMMENT FOR EACH PAPER IS FOR CROSS-REFERENCING WITHIN THESE COMMENTS ONLY AND DOES NOT NECESSARILY AGREE WITH THE SHORT CODE USED ELSEWHERE IN THE PRESENT COMPILATION.) /// BELLAMY65 [E. H. BELLAMY,PROC. ROY. SOC. (LONDON) 289,509(1965)] -- /// BUSZA67 [W. BUSZA,NC 52A,331(1967)] -- PI- P DCS FROM 2K ELASTIC EVENTS AT EACH OF 5 MOMENTA BETWEEN 1.72 AND 2.46 GEV/C. DONE AT NIMROD WITH OPTICAL SPARK CHAMBERS. THE APPARATUS IS DESCRIBED IN BELLAMY65, THE RESULTS IN BUSZA67. /// BUSZA69 [W. BUSZA,PR 180,1339(1969)] -- PI+ P DCS AT 10 MOMENTA BETWEEN 1.72 AND 2.80 GEV/C,AND PI- P DCS AT 5 MOMENTA BETWEEN 2.17 AND 2.80 GEV/C. THE DATA REPORTED IN BUSZA67 ARE ALSO REPEATED HERE. THE NEW MEASUREMENTS WERE DONE WITH AN IMPROVED VERSION OF THE APPARATUS USED BY BUSZA67. THE PI- DATA (INCLUDING BUSZA67)ARE NORMALIZED TO FORWARD DISPERSION RELATIONS,THE PI+ DATAHAS ITS OWN EXPERIMENTAL NORMALIZATION BUT NO NE IS GIVEN. WE HAVE INCREASED THE ERROR OF THE MOST FORWARD PI+ POINT AT 1.72 GEV/C BECAUSE OF AN AMBIGUOUS FOOTNOTE CONCERNING THIS POINT. /// COMMENTS FROM LOVELACE71 COMPILATION OF THESE DATA -- LOVELACE71 CLAIMS SOME USE WAS MADE OF FORWARD DISPERSION RELATIONS TO NORMALIZE THE PI+ DATA AS WELL AS THE PI-. THE FOLLOWING NORMALIZATION ERRORS AND RENORMALIZATION FACTORS ARE RECOMMENDED FOR THE PI+ P AND PI- P DIFFERENTIAL CROSS SECTIONS -- PLAB=1720 MEV/C -- NE(PI+ P)=INFIN, NE(PI- P)=INFIN. PLAB=1890 MEV/C -- RF(PI+ P)=1.245, RF(PI- P)=0.941. PLAB=2070 MEV/C -- NE(PI+ P)=INFIN, RF(PI- P)=1.224. PLAB=2170 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2270 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=INFIN. PLAB=2360 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2460 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=INFIN. PLAB=2560 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2650 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2800 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . /// COMMENTS ON MODIFICATIONS TO LOVELACE71 COMPILATION BY KELLY -- WE HAVE TAKEN ALL PI- NES TO BE INFINITE,AND ALL PI+ NES TO BE UNKNOWN. ALSO ONE MINOR MISTAKE IN THE PI- (PI+) DATA AT 2.36 (2.65) GEV/C HAS BEEN CORRECTED.. DATA ARE UNNORMALIZED OR NORMALIZED TO OTHER DATA.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Normalization
# Generate a resting state (rs) timeseries (ts)
# Install / load package to make fake fMRI ts
# install.packages("neuRosim")
library(neuRosim)
# Generate a ts
ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
# 3dDetrend -normalize
# R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
# Do for the full timeseries
ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
# Do this again for a shorter version of the same timeseries
ts.shorter.length <- length(ts.normalised.long)/4
ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
# By looking at the summaries, it can be seen that the median values become larger
summary(ts.normalised.long)
summary(ts.normalised.short)
# Plot results for the long and short ts
# Truncate the longer ts for plotting only
ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
# Give the plot a title
title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
# Add zero line
lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
# 3dDetrend -normalize -polort 0 for long timeseries
lines(ts.normalised.long.made.shorter, col='blue');
# 3dDetrend -normalize -polort 0 for short timeseries
lines(ts.normalised.short, col='red');
Standardization/modernization
New afni_proc.py command line
afni_proc.py \
-subj_id "$sub_id_name_1" \
-blocks despike tshift align tlrc volreg mask blur scale regress \
-radial_correlate_blocks tcat volreg \
-copy_anat anatomical_warped/anatSS.1.nii.gz \
-anat_has_skull no \
-anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \
-anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \
-anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \
-anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \
-anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \
-anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \
-anat_follower_erode fsvent fswm \
-dsets media_?.nii.gz \
-tcat_remove_first_trs 8 \
-tshift_opts_ts -tpattern alt+z2 \
-align_opts_aea -cost lpc+ZZ -giant_move -check_flip \
-tlrc_base "$basedset" \
-tlrc_NL_warp \
-tlrc_NL_warped_dsets \
anatomical_warped/anatQQ.1.nii.gz \
anatomical_warped/anatQQ.1.aff12.1D \
anatomical_warped/anatQQ.1_WARP.nii.gz \
-volreg_align_to MIN_OUTLIER \
-volreg_post_vr_allin yes \
-volreg_pvra_base_index MIN_OUTLIER \
-volreg_align_e2a \
-volreg_tlrc_warp \
-mask_opts_automask -clfrac 0.10 \
-mask_epi_anat yes \
-blur_to_fwhm -blur_size $blur \
-regress_motion_per_run \
-regress_ROI_PC fsvent 3 \
-regress_ROI_PC_per_run fsvent \
-regress_make_corr_vols aeseg fsvent \
-regress_anaticor_fast \
-regress_anaticor_label fswm \
-regress_censor_motion 0.3 \
-regress_censor_outliers 0.1 \
-regress_apply_mot_types demean deriv \
-regress_est_blur_epits \
-regress_est_blur_errts \
-regress_run_clustsim no \
-regress_polort 2 \
-regress_bandpass 0.01 1 \
-html_review_style pythonic
We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.
Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.
Effect on results
Facebook
TwitterTHE FOLLOWING COMMENTS ARE TAKEN FROM THE PI N COMPILATION OF R.L. KELLY. THEY ARE THAT COMPILATION& apos;S COMPLETE SET OF COMMENTS FOR PAPERS RELATED TO THE SAME EXPERIMENT (DESIGNATED BUSZA69) AS THE CURRENT PAPER. (THE IDENTIFIER PRECEDING THE REFERENCE AND COMMENT FOR EACH PAPER IS FOR CROSS-REFERENCING WITHIN THESE COMMENTS ONLY AND DOES NOT NECESSARILY AGREE WITH THE SHORT CODE USED ELSEWHERE IN THE PRESENT COMPILATION.) /// BELLAMY65 [E. H. BELLAMY,PROC. ROY. SOC. (LONDON) 289,509(1965)] -- /// BUSZA67 [W. BUSZA,NC 52A,331(1967)] -- PI- P DCS FROM 2K ELASTIC EVENTS AT EACH OF 5 MOMENTA BETWEEN 1.72 AND 2.46 GEV/C. DONE AT NIMROD WITH OPTICAL SPARK CHAMBERS. THE APPARATUS IS DESCRIBED IN BELLAMY65, THE RESULTS IN BUSZA67. /// BUSZA69 [W. BUSZA,PR 180,1339(1969)] -- PI+ P DCS AT 10 MOMENTA BETWEEN 1.72 AND 2.80 GEV/C,AND PI- P DCS AT 5 MOMENTA BETWEEN 2.17 AND 2.80 GEV/C. THE DATA REPORTED IN BUSZA67 ARE ALSO REPEATED HERE. THE NEW MEASUREMENTS WERE DONE WITH AN IMPROVED VERSION OF THE APPARATUS USED BY BUSZA67. THE PI- DATA (INCLUDING BUSZA67)ARE NORMALIZED TO FORWARD DISPERSION RELATIONS,THE PI+ DATAHAS ITS OWN EXPERIMENTAL NORMALIZATION BUT NO NE IS GIVEN. WE HAVE INCREASED THE ERROR OF THE MOST FORWARD PI+ POINT AT 1.72 GEV/C BECAUSE OF AN AMBIGUOUS FOOTNOTE CONCERNING THIS POINT. /// COMMENTS FROM LOVELACE71 COMPILATION OF THESE DATA -- LOVELACE71 CLAIMS SOME USE WAS MADE OF FORWARD DISPERSION RELATIONS TO NORMALIZE THE PI+ DATA AS WELL AS THE PI-. THE FOLLOWING NORMALIZATION ERRORS AND RENORMALIZATION FACTORS ARE RECOMMENDED FOR THE PI+ P AND PI- P DIFFERENTIAL CROSS SECTIONS -- PLAB=1720 MEV/C -- NE(PI+ P)=INFIN, NE(PI- P)=INFIN. PLAB=1890 MEV/C -- RF(PI+ P)=1.245, RF(PI- P)=0.941. PLAB=2070 MEV/C -- NE(PI+ P)=INFIN, RF(PI- P)=1.224. PLAB=2170 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2270 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=INFIN. PLAB=2360 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2460 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=INFIN. PLAB=2560 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2650 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2800 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . /// COMMENTS ON MODIFICATIONS TO LOVELACE71 COMPILATION BY KELLY -- WE HAVE TAKEN ALL PI- NES TO BE INFINITE,AND ALL PI+ NES TO BE UNKNOWN. ALSO ONE MINOR MISTAKE IN THE PI- (PI+) DATA AT 2.36 (2.65) GEV/C HAS BEEN CORRECTED.. DATA ARE UNNORMALIZED OR NORMALIZED TO OTHER DATA.
Facebook
TwitterThe technological advances in mass spectrometry allow us to collect more comprehensive data with higher quality and increasing speed. With the rapidly increasing amount of data generated, the need for streamlining analyses becomes more apparent. Proteomics data is known to be often affected by systemic bias from unknown sources, and failing to adequately normalize the data can lead to erroneous conclusions. To allow researchers to easily evaluate and compare different normalization methods via a user-friendly interface, we have developed “proteiNorm”. The current implementation of proteiNorm accommodates preliminary filters on peptide and sample levels followed by an evaluation of several popular normalization methods and visualization of the missing value. The user then selects an adequate normalization method and one of the several imputation methods used for the subsequent comparison of different differential expression methods and estimation of statistical power. The application of proteiNorm and interpretation of its results are demonstrated on two tandem mass tag multiplex (TMT6plex and TMT10plex) and one label-free spike-in mass spectrometry example data set. The three data sets reveal how the normalization methods perform differently on different experimental designs and the need for evaluation of normalization methods for each mass spectrometry experiment. With proteiNorm, we provide a user-friendly tool to identify an adequate normalization method and to select an appropriate method for differential expression analysis.
Facebook
TwitterTHE FOLLOWING COMMENTS ARE TAKEN FROM THE PI N COMPILATION OF R.L. KELLY. THEY ARE THAT COMPILATION& apos;S COMPLETE SET OF COMMENTS FOR PAPERS RELATED TO THE SAME EXPERIMENT (DESIGNATED NEWCOMB63) AS THE CURRENT PAPER. (THE IDENTIFIER PRECEDING THE REFERENCE AND COMMENT FOR EACH PAPER IS FOR CROSS-REFERENCING WITHIN THESE COMMENTS ONLY AND DOES NOT NECESSARILY AGREE WITH THE SHORT CODE USED ELSEWHERE IN THE PRESENT COMPILATION.) /// NEWCOMB63 [P. C. A. NEWCOMB,PR 132,1283(1963).] -- PI+ P DCS AT 725 MEV/C FROM 1245 ELASTIC EVENTS IN THE 15 INCH LRL HBC AT THE BEVATRON. DATA PRESENTED AS A TABLE OF NUMBERS OF EVENTS AND A HISTOGRAM NORMALIZED TO A TOTAL CS OF 16.1+/-0.8 MB. THE MB RECORDED HERE INCLUDES THE SPREAD IN BEAM MOMENTUM OVER THE FIDUCIAL VOLUME. /// NEWCOMB63T [P. C. A. NEWCOMB,UCB THESIS,UCRL-10563,1963.] -- A LARGER VESION OF THE HISTOGRAM IS GIVEN HERE. WE USED THE NORMALIZATION READ FROM THIS HISTOGRAM, 1 EVENT/ .1 COS(THETA) INTERVAL=.0141 MB/STER, TO NORMALIZE THE TABLE IN NEWCOMB63. /// COMMENTS ON MODIFICATIONS TO LOVELACE71 COMPILATION BY KELLY -- WE NORMALIZED THE TABLE IN NEWCOMB63 AS DESCRIBED ABOVE, RESULTING IN ONLY MINOR DIFFERENCES FROM THE LOVELACE71 VERSION WHICH WAS APPARENTLY READ DIRECTLY FROM THE HISTOGRAM IN NEWCOMB63.. DATA ARE UNNORMALIZED OR NORMALIZED TO OTHER DATA.
Facebook
TwitterALL DATA IN THIS RECORD ARE REDUNDANT. I.E., THEY WERE OBTAINED DIRECTLY FROM OTHER DATA IN THIS FILE, USUALLY BY EXTRAPOLATION OR INTEGRATION.. THE FOLLOWING COMMENTS ARE TAKEN FROM THE PI N COMPILATION OF R.L. KELLY. THEY ARE THAT COMPILATION& apos;S COMPLETE SET OF COMMENTS FOR PAPERS RELATED TO THE SAME EXPERIMENT (DESIGNATED BUSZA69) AS THE CURRENT PAPER. (THE IDENTIFIER PRECEDING THE REFERENCE AND COMMENT FOR EACH PAPER IS FOR CROSS-REFERENCING WITHIN THESE COMMENTS ONLY AND DOES NOT NECESSARILY AGREE WITH THE SHORT CODE USED ELSEWHERE IN THE PRESENT COMPILATION.) /// BELLAMY65 [E. H. BELLAMY,PROC. ROY. SOC. (LONDON) 289,509(1965)] -- /// BUSZA67 [W. BUSZA,NC 52A,331(1967)] -- PI- P DCS FROM 2K ELASTIC EVENTS AT EACH OF 5 MOMENTA BETWEEN 1.72 AND 2.46 GEV/C. DONE AT NIMROD WITH OPTICAL SPARK CHAMBERS. THE APPARATUS IS DESCRIBED IN BELLAMY65, THE RESULTS IN BUSZA67. /// BUSZA69 [W. BUSZA,PR 180,1339(1969)] -- PI+ P DCS AT 10 MOMENTA BETWEEN 1.72 AND 2.80 GEV/C,AND PI- P DCS AT 5 MOMENTA BETWEEN 2.17 AND 2.80 GEV/C. THE DATA REPORTED IN BUSZA67 ARE ALSO REPEATED HERE. THE NEW MEASUREMENTS WERE DONE WITH AN IMPROVED VERSION OF THE APPARATUS USED BY BUSZA67. THE PI- DATA (INCLUDING BUSZA67)ARE NORMALIZED TO FORWARD DISPERSION RELATIONS,THE PI+ DATAHAS ITS OWN EXPERIMENTAL NORMALIZATION BUT NO NE IS GIVEN. WE HAVE INCREASED THE ERROR OF THE MOST FORWARD PI+ POINT AT 1.72 GEV/C BECAUSE OF AN AMBIGUOUS FOOTNOTE CONCERNING THIS POINT. /// COMMENTS FROM LOVELACE71 COMPILATION OF THESE DATA -- LOVELACE71 CLAIMS SOME USE WAS MADE OF FORWARD DISPERSION RELATIONS TO NORMALIZE THE PI+ DATA AS WELL AS THE PI-. THE FOLLOWING NORMALIZATION ERRORS AND RENORMALIZATION FACTORS ARE RECOMMENDED FOR THE PI+ P AND PI- P DIFFERENTIAL CROSS SECTIONS -- PLAB=1720 MEV/C -- NE(PI+ P)=INFIN, NE(PI- P)=INFIN. PLAB=1890 MEV/C -- RF(PI+ P)=1.245, RF(PI- P)=0.941. PLAB=2070 MEV/C -- NE(PI+ P)=INFIN, RF(PI- P)=1.224. PLAB=2170 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2270 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=INFIN. PLAB=2360 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2460 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=INFIN. PLAB=2560 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2650 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . PLAB=2800 MEV/C -- NE(PI+ P)=0.1 , NE(PI- P)=0.1 . /// COMMENTS ON MODIFICATIONS TO LOVELACE71 COMPILATION BY KELLY -- WE HAVE TAKEN ALL PI- NES TO BE INFINITE,AND ALL PI+ NES TO BE UNKNOWN. ALSO ONE MINOR MISTAKE IN THE PI- (PI+) DATA AT 2.36 (2.65) GEV/C HAS BEEN CORRECTED.
Facebook
TwitterThe hurricane risk index is simply the product of the cumulative hurricane strikes per coastal county and the CDC Overall Social Vulnerability Index (SVI) for the given county. We normalize the hurricane strikes data to match the SVI data classification scheme (i.e., max value at 1); however, using the raw or normalized values of hurricane strikes has no impact on the spatial pattern of the risk index. Therefore, a risk index of value 1 indicates the county has the highest hurricane strikes of all the counties and is the most vulnerable county in the nation according to the SVI index. Because the analysis is over multiple states, we use the ‘United States’ SVI dataset at the county level. Values of the index are unevenly distributed so we classify intervals using the Jenks method and the first break at 0.08 is roughly equal to the median index value. For counties north of North Carolina, the low hurricane risk is most dependent on the low number of hurricane strikes. The vast majority of the counties fall in the lowest risk category and any in the second lowest category are there because of high social vulnerability.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
KITAB Text Reuse Data KITAB is funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme, awarded to the KITAB project (Grant Agreement No. 772989, PI Sarah Bowen Savant), hosted at Aga Khan University, London. In addition, it has received funding from the Qatar National Library to aid in the adaptation of the passim algorithm for Arabic. KITAB’s text reuse data is generated by running passim on the OpenITI corpus (DOI: 10.5281/zenodo.3082463). Each version is the output of a separate run and the version number corresponds to the corpus releases. To prepare the corpus for a passim run, we normalize texts and remove most of the non-Arabic characters and then chunk the texts into passages of 300 words (using the non-Arabic characters, including white space) in length. The chunks, called milestones, are identified by unique ids. This dataset represents the reuse cases that have been identified among milestones. The text reuse dataset consists of folders for each book. Each folder includes CSV files of the text reuse cases (alignments) between the corresponding book and all other books with which passim has found instances of reuses. The files have the below naming convention, using the book ids:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
KITAB Text Reuse Data KITAB’s text reuse data is generated by running passim on the OpenITI corpus (DOI: 10.5281/zenodo.3082463). Each version is the output of a separate run. To prepare the corpus for a passim run, we chunk texts into passages of 300 tokens (~words) in length. Also, we normalize texts and remove all non-Arabic characters. The chunks, called milestones, are identified by unique ids. This dataset represents the reuse cases that have been identified among milestones. The dataset contains folders for each book. Each folder includes alignment files between that book and all other books with which passim has found instances of reuse. The reuse cases between a pair of books are represented as a list of records. Each record is an alignment that shows a pair of matched passages between two books together with statistics, such as the algorithm score, and contextual information, such as the start and end positions of aligned passages so that one can find those passages in the books. A description of the alignment fields is given in the release notes.For each dataset, we generate statistical data on the alignments between the book pairs. The data is published in an application that facilitates search, filtering, and visualizations. The link to the corresponding application is given in the release notes.KITAB is funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme, awarded to the KITAB project (Grant Agreement No. 772989, PI Sarah Bowen Savant), hosted at Aga Khan University, London. In addition, it has received funding from the Qatar National Library to aid in the adaptation of the passim algorithm for Arabic.Note on Release Numbering: Version 2020.1.1—where 2020 is the year of the release, the first dotted number—.1—is the ordinal release number in 2020, and the second dotted number—.1—is the overall release number. The first dotted number will reset every year, while the second one will continue on increasing.Note: The very first release of the KITAB text reuse data (2019.1.1) is published here as it was too big to publish on Zenodo. To receive more information on the complete datasets please contact us via kitab-project@outlook.com (or other team members). Future releases may include part of the generated data if the size of whole data is too big to publish on Zenodo. However, the data is open access for anyone to use. We provide the detailed information on the datasets in the corresponding release notes.
Facebook
TwitterUniCourt simplifies access to structured court records with our Court Data API, so you can search court cases via API, get real-time alerts with webhooks, streamline your account management, and get bulk access to the AI normalized court data you need.
Search Court Cases with APIs
• Leverage UniCourt’s easy API integrations to search state and federal (PACER) court records directly from your own internal applications and systems. • Access the docket entries and case details you need on the parties, attorneys, law firms, and judges involved in litigation. • Conduct the same detailed case searches you can in our app with our APIs and easily narrow your search results using our jurisdiction, case type, and case status filters. • Use our Related Cases API to search for and download all of the court data for consolidated cases from the Judicial Panel on Multidistrict Litigation, as well as associated civil and criminal cases from U.S. District Courts.
Get Real-Time Alerts with Webhooks
• UniCourt’s webhooks provide you with industry leading automation tools for real-time push notifications to your internal applications for all your case tracking needs. • Get daily court data feeds with new case results for your automated court searches pushed directly to your applications in a structured format. • Use our custom search file webhook to search for and track thousands of entities at once and receive your results packaged into a custom CSV file. • Avoid making multiple API calls to figure out if a case has updates or not and remove the need to continuously check the status of large document orders and updates.
Bulk Access to Court Data
• UniCourt downloads thousands of new cases everyday from state and federal courts, and we structure them, normalize them with our AI, and make them accessible in bulk via our Court Data API. • Our rapidly growing CrowdSourced Library™ provides you with a massive free repository of 100+ million court cases, tens of millions of court documents, and billions of docket entries all at your fingertips. • Leverage your bulk access to AI normalized court data that’s been enriched with other public data sets to build your own analytics, competitive intelligence, and machine learning models.
Streamlined Account Management
• Easily manage your UniCourt account with information on your billing cycle and billing usage delivered to you via API. • Eliminate the requirement of logging in to your account to get a list of all of your invoices and use our APIs to directly download the invoices you need. • Get detailed data on which cases are being tracked by the users for your account and access all of the related tracking schedules for cases your users are tracking. • Gather complete information on the saved searches being run by your account, including the search parameters, filters, and much more.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
This mmWave Datasets are used for fitness activity identification. This dataset (FA Dataset) contains 14 common fitness daily activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.
Format: .png format
Section 1: Device Configuration
Section 2: Data Format
We provide our mmWave data in heatmaps for this dataset. The data file is in the png format. The details are shown in the following:
Section 3: Experimental Setup
Section 4: Data Description
14 common daily activities and their corresponding files
File Name Activity Type File Name Activity Type
FA1 Crunches FA8 Squats
FA2 Elbow plank and reach FA9 Burpees
FA3 Leg raise FA10 Chest squeezes
FA4 Lunges FA11 High knees
FA5 Mountain climber FA12 Side leg raise
FA6 Punches FA13 Side to side chops
FA7 Push ups FA14 Turning kicks
Section 5: Raw Data and Data Processing Algorithms
Section 6: Citations
If your paper is related to our works, please cite our papers as follows.
https://ieeexplore.ieee.org/document/9868878/
Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.
Bibtex:
@inproceedings{xie2022mmfit,
title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},
author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},
booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},
pages={1--10},
year={2022},
organization={IEEE}
}
Facebook
TwitterIn this experiment we wanted to compare between tumor and healthy status in breast cancer. To achieve this goal we performed a mass-spectrometry based proteomic analysis of two breast cancer cell lines (HCC) and one healthy cell line (HMEC) as control. From each cell line we created three biological replicates. Observe the proteomic data matrix, where rows are proteins and columns are samples.
Please submit a working notebook with embedded figures and results.
Facebook
TwitterOutput files from the 8. Metadata Analysis Workflow page of the SWELTR high-temp study. In this workflow, we compared environmental metadata with microbial communities. The workflow is split into two parts.
metadata_ssu18_wf.rdata : Part 1 contains all variables and objects for the 16S rRNA analysis. To see the Objects, in R run _load("metadata_ssu18_wf.rdata", verbose=TRUE)_
metadata_its18_wf.rdata : Part 2 contains all variables and objects for
the ITS analysis. To see the Objects, in R run
_load("metadata_its18_wf.rdata", verbose=TRUE)_
Additional files:
In both workflows, we run the following steps:
1) Metadata Normality Tests: Shapiro-Wilk Normality Test to test whether
each matadata parameter is normally distributed.
2) Normalize Parameters: R package bestNormalize to find and execute the
best normalizing transformation.
3) Split Metadata parameters into groups: a) Environmental and edaphic
properties, b) Microbial functional responses, and c) Temperature adaptation
properties.
4) Autocorrelation Tests: Test all possible pair-wise comparisons, on both
normalized and non-normalized data sets, for each group.
5) Remove autocorrelated parameters from each group.
6) Dissimilarity Correlation Tests: Use Mantel Tests to see if any on the
metadata groups are significantly correlated with the community data.
7) Best Subset of Variables: Determine which of the metadata parameters
from each group are the most strongly correlated with the community data. For
this we use the bioenv function from the vegan package.
8) Distance-based Redundancy Analysis: Ordination analysis of samples and
metadata vector overlays using capscale, also from the vegan package.
Source code for the workflow can be found here:
https://github.com/sweltr/high-temp/blob/master/metadata.Rmd
Facebook
TwitterAlthough the basic structure of logit-mixture models is well understood, important identification and normalization issues often get overlooked. This paper addresses issues related to the identification of parameters in logit-mixture models containing normally distributed error components associated with alternatives or nests of alternatives (normal error component logit mixture, or NECLM, models). NECLM models include special cases such as unrestricted, fixed covariance matrices; alternative-specific variances; nesting and cross-nesting structures; and some applications to panel data. A general framework is presented for determining which parameters are identified as well as what normalization to impose when specifying NECLM models. It is generally necessary to specify and estimate NECLM models at the levels, or structural, form. This precludes working with utility differences, which would otherwise greatly simplify the identification and normalization process. Our results show that identification is not always intuitive; for example, normalization issues present in logit-mixture models are not present in analogous probit models. To identify and properly normalize the NECLM, we introduce the equality condition, an addition to the standard order and rank conditions. The identifying conditions are worked through for a number of special cases, and our findings are demonstrated with empirical examples using both synthetic and real data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository provides the underlying data and neural network training scripts associated with the manuscript titled "A Transformer Network for High-Throughput Materials Characterization with X-ray Photoelectron Spectroscopy" by Simperl and Werner published in the Journal of Applied Physics (https://doi.org/10.1063/5.0296600) (2025)
All data files are released under the Creative Commons Attribution 4.0 International (CC-BY) license, while all code files are distributed under the MIT license.
The repository contains simulated X-ray photoelectron spectroscopy (XPS) spectra stored as hdf5 files in the zipped (h5_files.zip) folder, which was generated using the software developed by the authors. The NIST Standard Reference Database 100 – Simulation of Electron Spectra for Surface Analysis (SESSA) is freely available at https://www.nist.gov/srd/nist-standard-reference-database-100.
The neural network architecture is implemented using the PyTorch Lightning framework and is fully available within the attached materials as Transformer_SimulatedSpectra.py contained in the python_scripts.zip.
The trained model and the list of materials for the train, test and validation sets are contained in the models.zip folder.
The repository contains all the data necessary to replot the figures from the manuscript. These data are available in the form of .csv files or .h5 files for the spectra. In addition, the repository also contains a Python script (Plot_Data_Manuscript.ipynb) which is contained in the python_scripts.zip file.
The dataset and accompanying Python code files included in this repository were used to train a transformer-based neural network capable of directly inferring chemical concentrations from simulated survey X-ray photoelectron spectroscopy (XPS) spectra of bulk compounds.
The spectral dataset provided here represents the raw output from the SESSA software (version 2.2.2), prior to the normalization procedure described in the associated manuscript. This step of normalisation is of paramount importance for the effective training of the neural network.
The repository contains the Python scripts utilised to execute the spectral simulations and the neural network training on the Vienna Scientific Cluster (VSC5) which is part of the Austrian Scientific Computing Infrastructure (ASC). In order to obtain guidance on the proper configuration of the Command Line Interface (CLI) tools required for SESSA, users are advised to consult the official SESSA manual, which is available at the following address: https://nvlpubs.nist.gov/nistpubs/NSRDS/NIST.NSRDS.100-2024.pdf.
To run the neural network training we provided the requirements_nn_training.txt file that contains all the necessary python packages and version numbers. All other python scripts can be run locally with the python libraries listed in requirements_data_analysis.txt.
HDF5 (in zip folder): As described in the manuscript, we simulate X-ray photoelectron spectra for each of the 7,587 inorganic [1] and organic [2] materials in our dataset. To reflect realistic experimental conditions, each simulated spectrum was augmented by systematically varying parameters such as peak width, peak shift, and peak type—all configurable within the SESSA software—as well as by applying statistical Poisson noise to simulate varying signal-to-noise ratios. These modifications account for experimentally observed and material-specific spectral broadening, peak shifts, and detector-induced noise. Each material is represented by an individual HDF5 (.h5) file, named according to its chemical formula and mass density (in g/cm³). For example, the file for SiO2 with a density of 2.196 gcm-3 is named SiO2_2.196.h5. For more complex chemical formulas, such as Co(ClO4)2 with a density of 3.33 gcm-3, the file is named Co_ClO4_2_3.33.h5. Within each HDF5 file, the metadata for each spectrum is stored alongside a fixed energy axis and the corresponding intensity values. The spectral data are organized hierarchically by augmentation parameters in the following directory structure, e.g. for Ac_10.0.h5 we have SNR_0/WIDTH_0.3/SHIFT_-3.0/PEAK_gauss/Ac_10.0/. These files can be easily inspected with H5Web in Visual Studio Code or using h5py in Python or any other h5 interpretable program.
Session Files: The .ses files are SESSA specific input files that can be directly loaded into SESSA to specify certain input parameters for the initilization (ini), the geometry (geo) and the simulation parameters (sim_para) and are required by the python script Simulation_Script_VSC_json.py to run the simulation on the cluster.
Json Files: The two json files (MaterialsListVSC_gauss.json, MaterialsListVSC_lorentz.json) are used as the input files to the Python script Simulation_Script_VSC_json.py. These files contain all the material specific information for the SESSA simulation.
csv files: The csv files are used to generate the plots from the manuscript described in the section "Plotting Scripts".
npz files: The two .npz files (element_counts.npz, single_elements.npz) are python arrays that are needed by the Transformer_SimulatedSpectra.py script and contain the number of each single element in the dataset and an array of each single element present, respectively.
There is one python file that sets the communication with SESSA:
Simulation_Script_VSC_json.py: This script uses the functions of the VSC_function.py script (therefore needs to be placed in the same directory as this script) and can be called with the following command:
python3 Simulation_Script_VSC_json.py MaterialsListVSC_gauss.json 0
It simulates the spectrum for the material at index 0 in the .json file and with the corresponding parameters specified in the .json file.
It is important that before running this script the following paths need to be specified:
To run SESSA on a computing cluster it is important to have a working Xvfb (virtual frame buffer) or a similar tool available to which any graphical output from SESSA can be written to.
Before running the training script it is important to normalize the data such that the squared integral of the spectrum is 1 (as described in the manuscript) and shown in the code: normalize_spectra.py
For the neural network training we use the Transformer_SimulatedSpectra.py where the external functions used are specified in external_functions.py. This script contains the full description of the neural network architecture, the hyperparameter tuning and the Wandb logging.
In the models.zip folder the fully trained network final_trained_model.ckpt presented in the manuscript is available as well as the list of training, validation and testing materials (test_materials_list.pt, train_materials_list.pt, val_materials_list.pt) where the corresponding spectra are extracted from the hdf5 files. The file types .ckpt and .pt can be read in by using the pytorch specific load functions in Python, e.g.
torch.load(train_materials_list)
normalize_spectra.py: To run this script properly it is important to set up a python environment with the necessary libraries specified in the requirements_data_analysis.txt file. Then it can be called with
python3 normalize_spectra.py
where it is important to specify the path to the .h5 files containing the unnormalized spectra.
Transformer_SimulatedSpectra.py: To run this script properly on the cluster it is important to set up a python environment with the necessary libraries specified in the requirements_nn_training.txt file. This script also relies on external_functions.py, single_elements.npz and element_counts.npz (that should be placed in the same directory as the python script) file. This is important for creating the datasets for training, validation and testing and ensures that all the single elements appear in the testing set. You can call this script (on the cluster) within a slurm script to start the GPU training.
python3 Transformer_SimulatedSpectra.py
It is important that before running this script the following paths need to be specified:
Facebook
TwitterMetagenomic time-course studies provide valuable insights into the dynamics of microbial systems and have become increasingly popular alongside the reduction in costs of next-generation sequencing technologies. Normalization is a common but critical preprocessing step before proceeding with downstream analysis. To the best of our knowledge, currently there is no reported method to appropriately normalize microbial time-series data. We propose TimeNorm, a novel normalization method that considers the compositional property and time dependency in time-course microbiome data. It is the first method designed for normalizing time-series data within the same time point (intra-time normalization) and across time points (bridge normalization), separately. Intra-time normalization normalizes microbial samples under the same condition based on common dominant features. Bridge normalization detects and utilizes a group of most stable features across two adjacent time points for normalization. Through comprehensive simulation studies and application to a real study, we demonstrate that TimeNorm outperforms existing normalization methods and boosts the power of downstream differential abundance analysis.