100+ datasets found

s
Scaling with ranked subsampling (SRS) algorithm for the normalization of...
repository.soilwise-he.eu
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Scaling with ranked subsampling (SRS) algorithm for the normalization of species count data. [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/4b2b65c6-ff50-4669-99cc-ace343de3548
Explore at:
Dataset updated
Jul 1, 2020
Description
Scaling with ranked subsampling (SRS) is an algorithm for the normalization of species count data in ecology. So far, SRS has successfully been applied to microbial community data. "SRS is now available on CRAN: https://CRAN.R-project.org/package=SRS" An implementation of SRS in R is available for download: https://metadata.bonares.de/smartEditor/rest/upload/ID_7049_2020_05_13_SRS_function_v1_0_R.zip

SRS consists of two steps. In the first step, the counts for all OTUs (operational taxonomic untis) are divided by a scaling factor chosen in such a way that the sum of the scaled counts (Cscaled with integer or non-integer values) equals Cmin. In the second step, the non-integer count values are converted into integers by an algorithm that we dub ranked subsampling. The scaled count Cscaled for each OTU is split into the integer-part Cint by truncating the digits after the decimal separator (Cint = floor(Cscaled)) and the fractional part Cfrac (Cfrac = Cscaled - Cint). Since ΣCint ≤ Cmin, additional ∆C = Cmin - ΣCint counts have to be added to the library to reach the total count of Cmin. This is achieved as follows. OTUs are ranked in the descending order of their Cfrac values. Beginning with the OTU of the highest rank, single count per OTU is added to the normalized library until the total number of added counts reaches ∆C and the sum of all counts in the normalized library equals Cmin. When the lowest Cfrag involved in picking ∆C counts is shared by several OTUs, the OTUs used for adding a single count to the library are selected in the order of their Cint values. This selection minimizes the effect of normalization on the relative frequencies of OTUs. OTUs with identical Cfrag as well as Cint are sampled randomly without replacement.
d
GC/MS Simulated Data Sets normalized using median scaling
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scholtens, Denise (2023). GC/MS Simulated Data Sets normalized using median scaling [Dataset]. http://doi.org/10.7910/DVN/OYOLXD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/OYOLXD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Scholtens, Denise
Description
1000 simulated data sets stored in a list of R dataframes used in support of Reisetter et al. (submitted) 'Mixture model normalization for non-targeted gas chromatography / mass spectrometry metabolomics data'. These are results after normalization using median scaling as described in Reisetter et al.
f
Binary classification using a confusion matrix.
plos.figshare.com
xls
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chantha Wongoutong (2024). Binary classification using a confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0310839.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310839.t002
Dataset updated
Dec 6, 2024
Dataset provided by
PLOS ONE
Authors
Chantha Wongoutong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite the popularity of k-means clustering, feature scaling before applying it can be an essential yet often neglected step. In this study, feature scaling via five methods: Z-score, Min-Max normalization, Percentile transformation, Maximum absolute scaling, or RobustScaler beforehand was compared with using the raw (i.e., non-scaled) data to analyze datasets having features with different or the same units via k-means clustering. The results of an experimental study show that, for features with different units, scaling them before k-means clustering provided better accuracy, precision, recall, and F-score values than when using the raw data. Meanwhile, when features in the dataset had the same unit, scaling them beforehand provided similar results to using the raw data. Thus, scaling the features beforehand is a very important step for datasets with different units, which improves the clustering results and accuracy. Of the five feature-scaling methods used in the dataset with different units, Z-score standardization and Percentile transformation provided similar performances that were superior to the other or using the raw data. While Maximum absolute scaling, slightly more performances than the other scaling methods and raw data when the dataset contains features with the same unit, the improvement was not significant.
f
The performance results for k-means clustering and testing the hypothesis...
plos.figshare.com
xls
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chantha Wongoutong (2024). The performance results for k-means clustering and testing the hypothesis for homogeneity between the true grouped data and feature scaling on datasets containing features with the same unit. [Dataset]. http://doi.org/10.1371/journal.pone.0310839.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310839.t004
Dataset updated
Dec 6, 2024
Dataset provided by
PLOS ONE
Authors
Chantha Wongoutong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The performance results for k-means clustering and testing the hypothesis for homogeneity between the true grouped data and feature scaling on datasets containing features with the same unit.
MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A...
researchdata.edu.au
data.mendeley.com
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naveed Akhtar; Syed Mohammed Shamsul Islam; Douglas Chai; Muhammad Bilal Shaikh; Computer Science and Software Engineering (2023). MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A Benchmark Dataset [Dataset]. http://doi.org/10.17632/6D8V9JMVGM.1
Explore at:
Unique identifier
https://doi.org/10.17632/6D8V9JMVGM.1
Dataset updated
2023
Dataset provided by
Mendeley Ltd.
The University of Western Australia
Authors
Naveed Akhtar; Syed Mohammed Shamsul Islam; Douglas Chai; Muhammad Bilal Shaikh; Computer Science and Software Engineering
Description
his dataset comprises an array of Mel Frequency Cepstral Coefficients (MFCCs) that have undergone feature scaling, representing a variety of human actions. Feature scaling, or data normalization, is a preprocessing technique used to standardize the range of features in the dataset. For MFCCs, this process helps ensure all coefficients contribute equally to the learning process, preventing features with larger scales from overshadowing those with smaller scales.

In this dataset, the audio signals correspond to diverse human actions such as walking, running, jumping, and dancing. The MFCCs are calculated via a series of signal processing stages, which capture key characteristics of the audio signal in a manner that closely aligns with human auditory perception. The coefficients are then standardized or scaled using methods such as MinMax Scaling or Standardization, thereby normalizing their range. Each normalized MFCC vector corresponds to a segment of the audio signal.

The dataset is meticulously designed for tasks including human action recognition, classification, segmentation, and detection based on auditory cues. It serves as an essential resource for training and evaluating machine learning models focused on interpreting human actions from audio signals. This dataset proves particularly beneficial for researchers and practitioners in fields such as signal processing, computer vision, and machine learning, who aim to craft algorithms for human action analysis leveraging audio signals.
d
Scaling with ranked subsampling (SRS) algorithm for the normalization of...
search.dataone.org
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BonaRes Repository (2025). Scaling with ranked subsampling (SRS) algorithm for the normalization of species count data.@en [Dataset]. https://search.dataone.org/view/sha256%3Ade8d2ac9fc1bd11fd1258fccecff0deb00fb68af3ca54856f4a72db4abdbf061
Explore at:
Dataset updated
Mar 21, 2025
Dataset provided by
BonaRes Repository
Area covered

Description
Scaling with ranked subsampling (SRS) algorithm for the normalization of species count data..
d
R script to reproduce \"Improved normalization of species count data in...
search.dataone.org
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BonaRes Repository (2025). R script to reproduce \"Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities\".@en [Dataset]. https://search.dataone.org/view/sha256%3Aa934b23425b0e7e7d9d4278f89745fc842e75fdfe8b47de25c797034dadc1f51
Explore at:
Dataset updated
Mar 21, 2025
Dataset provided by
BonaRes Repository
Area covered

Description
R script to reproduce "Improved normalization of species count data in ecology by scaling with ranked subsampling (SRS): application to microbial communities"..
H
Supply Chain Management (Normalized)
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Supply Chain Management (Normalized) [Dataset]. http://doi.org/10.7910/DVN/WNB7AY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/WNB7AY
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool group 'Supply Chain Management' (SCM), including related concepts like Supply Chain Integration. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding SCM dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "supply chain management" + "supply chain logistics" + "supply chain". Processing: None. The dataset utilizes the original Google Trends index, which is base-100 normalized against the peak search interest for the specified terms and period. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Supply Chain Management + Supply Chain Integration + Supply Chain. Processing: The annual relative frequency series was normalized by setting the year with the maximum value to 100 and scaling all other values (years) proportionally. Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching SCM-related keywords [("supply chain management" OR ...) AND ("management" OR ...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly publication counts in Crossref. Data deduplicated via DOIs. Processing: For each month, the relative share of SCM-related publications (SCM Count / Total Crossref Count for that month) was calculated. This monthly relative share series was then normalized by setting the month with the maximum relative share to 100 and scaling all other months proportionally. Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Supply Chain Integration (1999, 2000, 2002); Supply Chain Management (2004, 2006, 2008, 2010, 2012, 2014, 2017, 2022). Processing: Semantic Grouping: Data points for "Supply Chain Integration" and "Supply Chain Management" were treated as a single conceptual series for SCM. Normalization: The combined series of original usability percentages was normalized relative to its own highest observed historical value across all included years (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Supply Chain Integration (1999, 2000, 2002); Supply Chain Management (2004, 2006, 2008, 2010, 2012, 2014, 2017, 2022). Processing: Semantic Grouping: Data points for "Supply Chain Integration" and "Supply Chain Management" were treated as a single conceptual series for SCM. Standardization (Z-scores): Original scores (X) were standardized using Z = (X - ?) / ?, with ?=3.0 and ??0.891609. Index Scale Transformation: Z-scores were transformed via Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding SCM dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
H
Business Process Reengineering (Normalized)
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Business Process Reengineering (Normalized) [Dataset]. http://doi.org/10.7910/DVN/QBP0E9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QBP0E9
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool 'Business Process Reengineering' (BPR). Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding BPR dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "business process reengineering" + "process reengineering" + "reengineering management". Processing: None. The dataset utilizes the original Google Trends index, which is base-100 normalized against the peak search interest for the specified terms and period. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Reengineering + Business Process Reengineering + Process Reengineering. Processing: The annual relative frequency series was normalized by setting the year with the maximum value to 100 and scaling all other values (years) proportionally. Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching BPR-related keywords [("business process reengineering" OR ...) AND ("management" OR ...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly publication counts in Crossref. Data deduplicated via DOIs. Processing: For each month, the relative share of BPR-related publications (BPR Count / Total Crossref Count for that month) was calculated. This monthly relative share series was then normalized by setting the month with the maximum relative share to 100 and scaling all other months proportionally. Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Reengineering (1993, 1996, 2000, 2002); Business Process Reengineering (2004, 2006, 2008, 2010, 2012, 2014, 2017, 2022). Processing: Semantic Grouping: Data points for "Reengineering" and "Business Process Reengineering" were treated as a single conceptual series for BPR. Normalization: The combined series of original usability percentages was normalized relative to its own highest observed historical value across all included years (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Reengineering (1993, 1996, 2000, 2002); Business Process Reengineering (2004, 2006, 2008, 2010, 2012, 2014, 2017, 2022). Processing: Semantic Grouping: Data points for "Reengineering" and "Business Process Reengineering" were treated as a single conceptual series for BPR. Standardization (Z-scores): Original scores (X) were standardized using Z = (X - ?) / ?, with a theoretically defined neutral mean ?=3.0 and an estimated pooled population standard deviation ??0.891609 (calculated across all tools/years relative to ?=3.0). Index Scale Transformation: Z-scores were transformed to an intuitive index via: Index = 50 + (Z * 22). This scale centers theoretical neutrality (original score: 3.0) at 50 and maps the approximate range [1, 5] to [?1, ?100]. Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding BPR dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
Z
WikiMed and PubMedDS: Two large-scale datasets for medical concept...
data.niaid.nih.gov
zenodo.org
Updated Dec 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dutt, Ritam (2021). WikiMed and PubMedDS: Two large-scale datasets for medical concept extraction and normalization research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5753475
Explore at:
Dataset updated
Dec 4, 2021
Dataset provided by
Dutt, Ritam
Vashishth, Shikhar
Newman-Griffis, Denis
Joshi, Rishabh
Rosé, Carolyn P
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Two large-scale, automatically-created datasets of medical concept mentions, linked to the Unified Medical Language System (UMLS).

WikiMed

Derived from Wikipedia data. Mappings of Wikipedia page identifiers to UMLS Concept Unique Identifiers (CUIs) was extracted by crosswalking Wikipedia, Wikidata, Freebase, and the NCBI Taxonomy to reach existing mappings to UMLS CUIs. This created a 1:1 mapping of approximately 60,500 Wikipedia pages to UMLS CUIs. Links to these pages were then extracted as mentions of the corresponding UMLS CUIs.

WikiMed contains:

393,618 Wikipedia page texts

1,067,083 mentions of medical concepts

57,739 unique UMLS CUIs

Manual evaluation of 100 random samples of WikiMed found 91% accuracy in the automatic annotations at the level of UMLS CUIs, and 95% accuracy in terms of semantic type.

PubMedDS

Derived from biomedical literature abstracts from PubMed. Mentions were automatically identified using distant supervision based on Medical Subject Heading (MeSH) headers assigned to the papers in PubMed, and recognition of medical concept mentions using the high-performance scispaCy model. MeSH header codes are included as well as their mappings to UMLS CUIs.

PubMedDS contains:

13,197,430 abstract texts

57,943,354 medical concept mentions

44,881 unique UMLS CUIs

Comparison with existing manually-annotated datasets (NCBI Disease Corpus, BioCDR, and MedMentions) found 75-90% precision in automatic annotations. Please note this dataset is not a comprehensive annotation of medical concept mentions in these abstracts (only mentions located through distant supervision from MeSH headers were included), but is intended as data for concept normalization research.

Due to its size, PubMedDS is distributed as 30 individual files of approximately 1.5 million mentions each.

Data format

Both datasets use JSON format with one document per line. Each document has the following structure:

{ "_id": "A unique identifier of each document", "text": "Contains text over which mentions are ", "title": "Title of Wikipedia/PubMed Article", "split": "[Not in PubMedDS] Dataset split: ", "mentions": [ { "mention": "Surface form of the mention", "start_offset": "Character offset indicating start of the mention", "end_offset": "Character offset indicating end of the mention", "link_id": "UMLS CUI. In case of multiple CUIs, they are concatenated using '|', i.e., CUI1|CUI2|..." }, {} ] }
f
Comparison of the average performance metric values for k-means clustering...
plos.figshare.com
xls
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chantha Wongoutong (2024). Comparison of the average performance metric values for k-means clustering of datasets having features with different (D1–D5) or the same (S1–S5) units. [Dataset]. http://doi.org/10.1371/journal.pone.0310839.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310839.t005
Dataset updated
Dec 6, 2024
Dataset provided by
PLOS ONE
Authors
Chantha Wongoutong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of the average performance metric values for k-means clustering of datasets having features with different (D1–D5) or the same (S1–S5) units.
f
Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...
figshare.com
frontiersin.figshare.com
xlsx
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2020.00594.s002
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
Processed Data for Manuscript "ADTnorm: Robust Integration of Single-cell...
zenodo.org
zip
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ye Zheng; Ye Zheng (2025). Processed Data for Manuscript "ADTnorm: Robust Integration of Single-cell Protein Measurement across CITE-seq Datasets" [Dataset]. http://doi.org/10.5281/zenodo.15477967
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15477967
Dataset updated
May 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ye Zheng; Ye Zheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) enables paired measurement of surface protein and mRNA expression in single cells using antibodies conjugated to oligonucleotide tags. Due to the high copy number of surface protein molecules, sequencing antibody-derived tags (ADTs) allows for robust protein detection, improving cell-type identification. However, variability in antibody staining leads to batch effects in the ADT expression, obscuring biological variation, reducing interpretability, and obstructing cross-study analyses. Here, we present ADTnorm, a normalization and integration method designed explicitly for ADT abundance. Benchmarking against 14 existing scaling and normalization methods, we show that ADTnorm accurately aligns populations with negative- and positive-expression of surface protein markers across 13 public datasets, effectively removing technical variation across batches and improving cell-type separation. ADTnorm enables efficient integration of public CITE-seq datasets, each with unique experimental designs, paving the way for atlas-level analyses. Beyond normalization, ADTnorm includes built-in utilities to aid in automated threshold-gating as well as assessment of antibody staining quality for titration optimization and antibody panel selection. Applying ADTnorm to an antibody titration study, a published COVID-19 CITE-seq dataset, and a human hematopoietic progenitors study allowed for identifying previously undetected phenotype-associated markers, illustrating a broad utility in biological applications.
VGG-16 with batch normalization
kaggle.com
zip
Updated Dec 15, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyTorch (2017). VGG-16 with batch normalization [Dataset]. https://www.kaggle.com/pytorch/vgg16bn
Explore at:
zip(514090274 bytes)Available download formats
Dataset updated
Dec 15, 2017
Dataset authored and provided by
PyTorch
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
VGG-16

Very Deep Convolutional Networks for Large-Scale Image Recognition

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Authors: Karen Simonyan, Andrew Zisserman
https://arxiv.org/abs/1409.1556

VGG Architectures

https://imgur.com/uLXrKxe.jpg" alt="VGG Architecture">

What is a Pre-trained Model?

A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. Learned features are often transferable to different data. For example, a model trained on a large dataset of bird images will contain learned features like edges or horizontal lines that you would be transferable your dataset.

Why use a Pre-trained Model?

Pre-trained models are beneficial to us for many reasons. By using a pre-trained model you are saving time. Someone else has already spent the time and compute resources to learn a lot of features and your model will likely benefit from it.
H
Change Management (Normalized)
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Change Management (Normalized) [Dataset]. http://doi.org/10.7910/DVN/J5KRBS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/J5KRBS
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool 'Change Management' (often encompassing Change Management Programs). Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Change Management dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "change management programs" + "change management" + "change management business". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Change Management Programs + Change Management. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Change Management-related keywords [("change management programs" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Change Mgmt Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Change Management Programs (2002, 2004, 2010, 2012, 2014, 2017, 2022). Processing: Normalization: Original usability percentages normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Change Management Programs (2002-2022). Processing: Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Change Management dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
H
Customer Segmentation (Normalized)
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Customer Segmentation (Normalized) [Dataset]. http://doi.org/10.7910/DVN/1RLQBY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/1RLQBY
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool 'Customer Segmentation', including the closely related concept of Market Segmentation. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Customer Segmentation dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "customer segmentation" + "market segmentation" + "customer segmentation marketing". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Customer Segmentation + Market Segmentation. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Customer Segmentation-related keywords [("customer segmentation" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Segmentation Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Customer Segmentation (1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Note: Not reported in 2022 survey data. Processing: Normalization: Original usability percentages normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Customer Segmentation (1999-2017). Note: Not reported in 2022 survey data. Processing: Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Customer Segmentation dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
S
30 m-scale Annual Global Normalized Difference Urban Index Datasets from...
scidb.cn
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Di Liu; Qingling Zhang (2023). 30 m-scale Annual Global Normalized Difference Urban Index Datasets from 2000 to 2021 [Dataset]. http://doi.org/10.57760/sciencedb.07081
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.07081
Dataset updated
Jan 13, 2023
Dataset provided by
Science Data Bank
Authors
Di Liu; Qingling Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Urban areas play a very important role in global climate change. There is an increasing interest in comprehending global urban areas with adequate geographic details for global climate change mitigation. Accurate and frequent urban area information is fundamental to comprehending urbanization processes and land use/cover change, as well as the impact of global climate and environmental change. Defense Meteorological Satellite Program/Operational Line Scan System (DMSP/OLS) night-light (NTL) imagery contributes powerfully to the spatial characterization of global cities, however, its application potential is seriously limited by its coarse resolution. In this paper, we generate annual Normalized Difference Urban Index (NDUI) to characterize global urban areas at a 30 m-resolution from 2000 to 2021 by combining Landsat-5,7,8 Normalized Difference Vegetation Index (NDVI) composites and DMSP/OLS NTL images on the Google Earth Engine (GEE) platform. With the capability to delineate urban boundaries and, at the same time, to present sufficient spatial details within urban areas, the NDUI datasets have the potential for urbanization studies at regional and global scales.
d
Study comparing scaling with ranked subsampling (SRS) and rarefying for the...
search.dataone.org
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BonaRes Repository (2025). Study comparing scaling with ranked subsampling (SRS) and rarefying for the normalization of species count data@en [Dataset]. https://search.dataone.org/view/sha256%3Aaebd3b305a7c3e99931a960ae7b540813075528f5d73dbbff839d8cf8476a98f
Explore at:
Dataset updated
Mar 21, 2025
Dataset provided by
BonaRes Repository
Area covered

Description
Study comparing scaling with ranked subsampling (SRS) and rarefying for the normalization of species count data.
H
Core Competencies (Normalized)
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Core Competencies (Normalized) [Dataset]. http://doi.org/10.7910/DVN/Y67KP1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/Y67KP1
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool 'Core Competencies' (also Core Competence). Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Core Competencies dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "core competencies" + "core competence strategy". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Core Competencies + Core Competence. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Core Competencies-related keywords [("core competencies" OR ...) AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Core Competencies Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Core Competencies (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Note: Not reported in 2022 survey data. Processing: Normalization: Original usability percentages normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Core Competencies (1993-2017). Note: Not reported in 2022 survey data. Processing: Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Core Competencies dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
H
Benchmarking (Normalized)
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Benchmarking (Normalized) [Dataset]. http://doi.org/10.7910/DVN/VW7AAX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/VW7AAX
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset provides processed and normalized/standardized indices for the management tool 'Benchmarking'. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Benchmarking dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "benchmarking" + "benchmarking management". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Benchmarking. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Benchmarking-related keywords ["benchmarking" AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Benchmarking Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Benchmarking (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Note: Not reported in 2022 survey data. Processing: Normalization: Original usability percentages normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Benchmarking (1993-2017). Note: Not reported in 2022 survey data. Processing: Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Benchmarking dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2020). Scaling with ranked subsampling (SRS) algorithm for the normalization of species count data. [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/4b2b65c6-ff50-4669-99cc-ace343de3548

Scaling with ranked subsampling (SRS) algorithm for the normalization of species count data.

Explore at:

Dataset updated

Jul 1, 2020

Description

Scaling with ranked subsampling (SRS) is an algorithm for the normalization of species count data in ecology. So far, SRS has successfully been applied to microbial community data. "SRS is now available on CRAN: https://CRAN.R-project.org/package=SRS" An implementation of SRS in R is available for download: https://metadata.bonares.de/smartEditor/rest/upload/ID_7049_2020_05_13_SRS_function_v1_0_R.zip

SRS consists of two steps. In the first step, the counts for all OTUs (operational taxonomic untis) are divided by a scaling factor chosen in such a way that the sum of the scaled counts (Cscaled with integer or non-integer values) equals Cmin. In the second step, the non-integer count values are converted into integers by an algorithm that we dub ranked subsampling. The scaled count Cscaled for each OTU is split into the integer-part Cint by truncating the digits after the decimal separator (Cint = floor(Cscaled)) and the fractional part Cfrac (Cfrac = Cscaled - Cint). Since ΣCint ≤ Cmin, additional ∆C = Cmin - ΣCint counts have to be added to the library to reach the total count of Cmin. This is achieved as follows. OTUs are ranked in the descending order of their Cfrac values. Beginning with the OTU of the highest rank, single count per OTU is added to the normalized library until the total number of added counts reaches ∆C and the sum of all counts in the normalized library equals Cmin. When the lowest Cfrag involved in picking ∆C counts is shared by several OTUs, the OTUs used for adding a single count to the library are selected in the order of their Cint values. This selection minimizes the effect of normalization on the relative frequencies of OTUs. OTUs with identical Cfrag as well as Cint are sampled randomly without replacement.

Clear search

Close search

Google apps

Main menu

Scaling with ranked subsampling (SRS) algorithm for the normalization of...

GC/MS Simulated Data Sets normalized using median scaling

Binary classification using a confusion matrix.

The performance results for k-means clustering and testing the hypothesis...

MFCCs Feature Scaling Images for Multi-class Human Action Analysis : A...

Scaling with ranked subsampling (SRS) algorithm for the normalization of...

R script to reproduce \"Improved normalization of species count data in...

Supply Chain Management (Normalized)

Business Process Reengineering (Normalized)

WikiMed and PubMedDS: Two large-scale datasets for medical concept...

Comparison of the average performance metric values for k-means clustering...

Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...

Processed Data for Manuscript "ADTnorm: Robust Integration of Single-cell...

VGG-16 with batch normalization

VGG-16

Very Deep Convolutional Networks for Large-Scale Image Recognition

VGG Architectures

What is a Pre-trained Model?

Why use a Pre-trained Model?

Change Management (Normalized)

Customer Segmentation (Normalized)

30 m-scale Annual Global Normalized Difference Urban Index Datasets from...

Study comparing scaling with ranked subsampling (SRS) and rarefying for the...

Core Competencies (Normalized)

Benchmarking (Normalized)

Scaling with ranked subsampling (SRS) algorithm for the normalization of species count data.