Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is about: (Table S2) Single non-normalized data of electron probe analyses of the Lipari obsidian reference standard. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.859554 for more information.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Permafrost degradation influences the morphology, biogeochemical cycling and hydrology of Arctic landscapes over a range of time scales. To reconstruct temporal patterns of early to late Holocene permafrost and thermokarst dynamics, site-specific palaeo-records are needed. Here we present a multi-proxy study of a 350-cm-long permafrost core from a drained lake basin on the northern Seward Peninsula, Alaska, revealing Lateglacial to Holocene thermokarst lake dynamics in a central location of Beringia. Use of radiocarbon dating, micropalaeontology (ostracods and testaceans), sedimentology (grain-size analyses, magnetic susceptibility, tephra analyses), geochemistry (total nitrogen and carbon, total organic carbon, d13Corg) and stable water isotopes (d18O, dD, d excess) of ground ice allowed the reconstruction of several distinct thermokarst lake phases. These include a pre-lacustrine environment at the base of the core characterized by the Devil Mountain Maar tephra (22 800±280 cal. a BP, Unit A), which has vertically subsided in places due to subsequent development of a deep thermokarst lake that initiated around 11 800 cal. a BP (Unit B). At about 9000 cal. a BP this lake transitioned from a stable depositional environment to a very dynamic lake system (Unit C) characterized by fluctuating lake levels, potentially intermediate wetland development, and expansion and erosion of shore deposits. Complete drainage of this lake occurred at 1060 cal. a BP, including post-drainage sediment freezing from the top down to 154 cm and gradual accumulation of terrestrial peat (Unit D), as well as uniform upward talik refreezing. This core-based reconstruction of multiple thermokarst lake generations since 11 800 cal. a BP improves our understanding of the temporal scales of thermokarst lake development from initiation to drainage, demonstrates complex landscape evolution in the ice-rich permafrost regions of Central Beringia during the Lateglacial and Holocene, and enhances our understanding of biogeochemical cycles in thermokarst-affected regions of the Arctic.
1000 simulated data sets stored in a list of R dataframes used in support of Reisetter et al. (submitted) 'Mixture model normalization for non-targeted gas chromatography / mass spectrometry metabolomics data'. These are simulated data sets that include batch effects and data truncation and are not yet normalized.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background
The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
Methods
This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson's correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
Results
The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson's correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data used in the programs 8075DegRise_CO2NMDPSIRR_V5.bas (DOI 10.5281/zenodo.1418561) and 8075DegRise_NoNMDP_V1.bas (DOI 10.5281/zenodo.1419629).
All data is entered one item per line.The order of the entries is:
First entry – The total number of sea surface temperature entries.
Second entry – The number of entries for all other data.
Third entry – The offset from the first entry of a set of data to the year 1880. This is the same for all sets of data except sea surface temperature which is based at 1880 and never changes.
The nonzeroed non-normalized sea surface temperature anomalies, the number of which is specified by the First entry
The nonzeroed non-normalized North Magnetic Dip Pole kilometers moved from the previous year, the number of entries is specified in the Second entry.
The nonzeroed non-normalized solar irradiation average for this year, the number of entries is specified in the Second entry.
The nonzeroed non-normalized CO2 ppm average for this year, the number of entries is specified in the Second entry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Danish similarity dataset is a gold standard resource for evaluation of Danish word embedding models. The dataset consists of 99 word pairs rated by 38 human judges according to their semantic similarity, i.e. the extend to which the two words are similar in meaning, in a normalized 0-1 range. Note that this dataset provides a way of measuring similarity rather than relatedness/association. Description of files included in this material: (Note: In both of the included files, rows correspond to items (word pairs) and columns to properties of each item.) All_sims_da.csv: Contains the non-normalized mean similarity scores over all judges, along with the non-normalized scores given by each of the 38 judges on the scale 0-6, where 0 is given to the most dissimilar items and 6 to the most similar items. Gold_sims_da.csv: Contains the similarity gold standard for each item, which is the normalized mean similarity score for a given item over all judges. Scores are normalized to a 0-1 range, where 0 denotes the minimum degree of similarity and 1 denotes the maximum degree of similarity.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
1000 simulated data sets stored in a list of R dataframes used in support of Reisetter et al. (submitted) 'Mixture model normalization for non-targeted gas chromatography / mass spectrometry metabolomics data'. These are results after normalization using quantile normalization (Bolstad et al. 2003).
The zip-file contains supplementary files (normalized data sets and R-codes) to reproduce the analyses presented in the paper "Use of pre-transformation to cope with extreme values in important candidate features" by Boulesteix, Guillemot & Sauerbrei (Biometrical Journal, 2011). The raw data (CEL-files) are publicly available and described in the following papers: - Ancona et al, 2006. On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics 7, 387. - Miller et al, 2005. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Science 102, 13550–13555. - Minn et al, 2005. Genes that mediate breast cancer metastasis to lung. Nature 436, 518–524. - Pawitan et al, 2005. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Research 7, R953–964. - Scherzer et al, 2007. Molecular markers of early parkinsons disease based on gene expression in blood. Proceedings of the National Academy of Science 104, 955-960. - Singh et al, 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209. - Sotiriou et al, 2006. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98, 262–272. - Tang et al, 2009. Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Critical Care Medicine 37, 882–888. - Wang et al, 2005. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679. - Irizarry, 2003. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31 (4), e15. - Irizarry et al, 2006. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22 (7), 789–794.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a database for feature representation of ESM2, which includes Swiss data, Swiss normalized data, original TrEMBL data, original TrEMBL normalized data, non-homology TrEMBL data and Table S10.Non-homologous TrEMBL normalized data can be created by extracting Entry ID from the non-homologous TrEMBL data and then extracting the corresponding feature representation from the original TrEMBL normalized data.Figure S4 (eos) and Figure S5 (eos) are supplement for the Histogram plots and Scatter plots of feature eos in corresponding Figure S4 and Figure S5.Figure S6 and Figure S8 are the results of GO annotation enrichment. The GO gene set is a grouped protein dataset used for GO annotation enrichment.Figure S7 is a silhouette score plot.For specific usage of the dataset, please refer to Github.The RF_model files are pickle files for different RF models, which can be used for dataset inference and interpretable analysis. Among these models, the AA_count model and feature_all model have more complex feature inputs. Therefore, we provide the Swiss training dataset as a reference for feature arrangement. The feature order for other models is simply from 0 to 1279.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is about: X-ray fluorescence (XRF) scanning data (raw data, not normalized) from IODP Site U1387, 0-200 mcd. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.831133 for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about: Raw (not normalized) X-ray fluorescence (XRF) scanning data of IODP Hole 318-U1357B. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.933380 for more information.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides processed and normalized/standardized indices for the management tool 'Benchmarking'. Derived from five distinct raw data sources, these indices are specifically designed for comparative longitudinal analysis, enabling the examination of trends and relationships across different empirical domains (web search, literature, academic publishing, and executive adoption). The data presented here represent transformed versions of the original source data, aimed at achieving metric comparability. Users requiring the unprocessed source data should consult the corresponding Benchmarking dataset in the Management Tool Source Data (Raw Extracts) Dataverse. Data Files and Processing Methodologies: Google Trends File (Prefix: GT_): Normalized Relative Search Interest (RSI) Input Data: Native monthly RSI values from Google Trends (Jan 2004 - Jan 2025) for the query "benchmarking" + "benchmarking management". Processing: None. Utilizes the original base-100 normalized Google Trends index. Output Metric: Monthly Normalized RSI (Base 100). Frequency: Monthly. Google Books Ngram Viewer File (Prefix: GB_): Normalized Relative Frequency Input Data: Annual relative frequency values from Google Books Ngram Viewer (1950-2022, English corpus, no smoothing) for the query Benchmarking. Processing: Annual relative frequency series normalized (peak year = 100). Output Metric: Annual Normalized Relative Frequency Index (Base 100). Frequency: Annual. Crossref.org File (Prefix: CR_): Normalized Relative Publication Share Index Input Data: Absolute monthly publication counts matching Benchmarking-related keywords ["benchmarking" AND (...) - see raw data for full query] in titles/abstracts (1950-2025), alongside total monthly Crossref publications. Deduplicated via DOIs. Processing: Monthly relative share calculated (Benchmarking Count / Total Count). Monthly relative share series normalized (peak month's share = 100). Output Metric: Monthly Normalized Relative Publication Share Index (Base 100). Frequency: Monthly. Bain & Co. Survey - Usability File (Prefix: BU_): Normalized Usability Index Input Data: Original usability percentages (%) from Bain surveys for specific years: Benchmarking (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Note: Not reported in 2022 survey data. Processing: Normalization: Original usability percentages normalized relative to its historical peak (Max % = 100). Output Metric: Biennial Estimated Normalized Usability Index (Base 100 relative to historical peak). Frequency: Biennial (Approx.). Bain & Co. Survey - Satisfaction File (Prefix: BS_): Standardized Satisfaction Index Input Data: Original average satisfaction scores (1-5 scale) from Bain surveys for specific years: Benchmarking (1993-2017). Note: Not reported in 2022 survey data. Processing: Standardization (Z-scores): Using Z = (X - 3.0) / 0.891609. Index Scale Transformation: Index = 50 + (Z * 22). Output Metric: Biennial Standardized Satisfaction Index (Center=50, Range?[1,100]). Frequency: Biennial (Approx.). File Naming Convention: Files generally follow the pattern: PREFIX_Tool_Processed.csv or similar, where the PREFIX indicates the data source (GT_, GB_, CR_, BU_, BS_). Consult the parent Dataverse description (Management Tool Comparative Indices) for general context and the methodological disclaimer. For original extraction details (specific keywords, URLs, etc.), refer to the corresponding Benchmarking dataset in the Raw Extracts Dataverse. Comprehensive project documentation provides full details on all processing steps.
This dataset includes all of the data needed to validate and/or reproduce the manganese calibration model described in the manuscript. The reference database contains metadata for the new Mn-bearing standards, minerals, and mixtures that are > 2.9 wt.% MnO. In addition, the files include the MnO composition data for all standards used, non-normalized spectral data, mean peak area spectrum, results of outlier determination, RMSECV data, regression vectors, and Test Set predictions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set includes the water flow and nutrient inputs (total nitrogen and phosphorous) of monitored river catchments and unmonitored areas as well as the nutrient inputs from point sources (municipal, industrial and aquaculture) that are directly discharging into the sea. The water flow and nutrient input data is available as actual (non-normalized) and flow normalized. The data covers the whole Baltic Sea drainage area and the years 1995-2022. Together the riverine and point source inputs provide the total land-based load of nutrients into the Baltic Sea. The data consists of several excel spreadsheets providing background info and actual load values. The spatial delineation of these catchments can be accessed through the metadata record for PLC Assessment data river catchments (https://metadata.helcom.fi/geonetwork/srv/eng/catalog.search#/metadata/d7b5404a-b3e1-4fe6-be35-a12055736783). The tabular and the spatial data includes the catchments that have a full time series of data for 1995-2022, and the nutrient load of rivers with incomplete time series are added to the adjacent unmonitored areas. The non-normalized data is used in the HELCOM Baltic Sea Environmental Fact Sheet (BSEFS) on Waterborne nitrogen and phosphorus inputs and water flow to the Baltic Sea that is published annually (https://helcom.fi/baltic-sea-trends/environment-fact-sheets/eutrophication/). The normalized data is used for the inputs of nutrients core indicator (https://indicators.helcom.fi/indicator/inputs-of-nutrients/) as well as the NIC assessment (https://helcom.fi/baltic-sea-action-plan/nutrient-reduction-scheme/national-nutrient-input-ceilings/). The assessment data set is based on the reporting by HELCOM Contracting Parties within the Pollution Load Compilation framework (PLC). The assessment dataset is produced by the Baltic Nest Institute (BNI), Stockholm University together with the Danish Centre for Environment and Energy (DCE), Aarhus University from the original reported data. For a more detailed description of the spreadsheets and included data, please scroll down to lineage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fig4C Raw Non-Mean Normalized Luciferase Data GlnRSLUC dadv1
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
DOI retrieved: 2014
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fig3C Raw Non-Mean Normalized Luciferase Data AspRSLUC dncu00275
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. This database contains the full NPRI dataset from 1993 to the current reporting year. To help you navigate, a Microsoft Word file provides information on the database’s structure and schema. The database is available in Microsoft Access format (accdb). The data are in normalized or “list” format and are optimized for pivot table analyses. The data are also available in a CSV format : https://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information This data is also available in non-proprietary CSV format on the Bulk Data page. http://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb These files contain data from 1993 to the latest reporting year available. These datasets are in normalized or ‘list’ format and are optimized for pivot table analyses. Supporting Projects: National Pollutant Release Inventory (NPRI)
This visualization product displays the cigarette related items abundance of marine macro-litter (> 2.5cm) per beach per year from non-MSFD monitoring surveys, research & cleaning operations without UNEP-MARLIN data.
EMODnet Chemistry included the collection of marine litter in its 3rd phase. Since the beginning of 2018, data of beach litter have been gathered and processed in the EMODnet Chemistry Marine Litter Database (MLDB).
The harmonization of all the data has been the most challenging task considering the heterogeneity of the data sources, sampling protocols and reference lists used on a European scale.
Preliminary processings were necessary to harmonize all the data:
Exclusion of OSPAR 1000 protocol: in order to follow the approach of OSPAR that it is not including these data anymore in the monitoring;
Selection of surveys from non-MSFD monitoring, cleaning and research operations;
Exclusion of beaches without coordinates;
Selection of plastic bags related items only. The list of selected items is attached to this metadata. This list was created using EU Marine Beach Litter Baselines, the European Threshold Value for Macro Litter on Coastlines and the Joint list of litter categories for marine macro-litter monitoring from JRC (these three documents are attached to this metadata);
Exclusion of surveys without associated length;
Normalization of survey lengths to 100m & 1 survey / year: in some case, the survey length was not 100m, so in order to be able to compare the abundance of litter from different beaches a normalization is applied using this formula:
Number of plastic bags related items of the survey (normalized by 100 m) = Number of plastic bags related items of the survey x (100 / survey length)
Then, this normalized number of plastic bags related items is summed to obtain the total normalized number of plastic bags related items for each survey. Finally, the median abundance of plastic bags related items for each beach and year is calculated from these normalized abundances of plastic bags related items per survey.
Percentiles 50, 75, 95 & 99 have been calculated taking into account plastic bags related items from other sources data for all years.
More information is available in the attached documents.
Warning: the absence of data on the map does not necessarily mean that they do not exist, but that no information has been entered in the Marine Litter Database for this area.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is about: (Table S2) Single non-normalized data of electron probe analyses of the Lipari obsidian reference standard. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.859554 for more information.