68 datasets found
  1. f

    Raw seq data quality control

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Aug 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kadobianskyi, Mykola; Schulze, Lisanne; Judkewitz, Benjamin; Schuelke, Markus (2019). Raw seq data quality control [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000115706
    Explore at:
    Dataset updated
    Aug 12, 2019
    Authors
    Kadobianskyi, Mykola; Schulze, Lisanne; Judkewitz, Benjamin; Schuelke, Markus
    Description

    ZIP archive with FastQC-generated quality reports for the short-read libraries used in the assembly and annotation

  2. e

    Data from: RawBeans: a simple, vendor independent, raw-data quality control...

    • ebi.ac.uk
    • data.niaid.nih.gov
    • +2more
    Updated Nov 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yishai Levin (2021). RawBeans: a simple, vendor independent, raw-data quality control tool [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD022816
    Explore at:
    Dataset updated
    Nov 3, 2021
    Authors
    Yishai Levin
    Variables measured
    Proteomics
    Description

    Every laboratory performing mass spectrometry based proteomics strives to generate high quality data. Among the many factors that influence the outcome of any experiment in proteomics is performance of the LC-MS system, which should be monitored continuously. This process is termed quality control (QC). We present an easy to use, rapid tool, which produces a visual, HTML based report that includes the key parameters needed to monitor LC-MS system perfromance. The tool, named RawBeans, can generate a report for individual files, or for a set of samples from a whole experiment. We anticipate it will help proteomics users and experts evaluate raw data quality, independent of data processing. The tool is available here: https://bitbucket.org/incpm/prot-qc/downloads.

  3. Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series...

    • osti.gov
    • dataone.org
    • +1more
    Updated Dec 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental System Science Data Infrastructure for a Virtual Ecosystem (2020). Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series Data for Billy Barr, East River, Colorado USA [Dataset]. http://doi.org/10.15485/1823516
    Explore at:
    Dataset updated
    Dec 31, 2020
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Environmental System Science Data Infrastructure for a Virtual Ecosystem
    Area covered
    Colorado, United States, East River
    Description

    A comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework consists of three major phases: Phase 1—Preliminary raw data sets exploration, including time formatting and combining datasets of different lengths and different time intervals; Phase 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme values; and Phase 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado) were analyzed. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.The files that are in this data package include one excel file, converted to CSV format (Billy_Barr_raw_qaqc.csv) that contains the raw meteorological data, i.e., input data used for the QA/QC analysis. The second CSV file (Billy_Barr_1hr.csv) is the QA/QC and flagged meteorological data, i.e., output data from the QA/QC analysis. The last file (QAQC_Billy_Barr_2021-03-22.R) is a script written in R that implements the QA/QC and flagging process. The purpose of the CSV data files included in this package is to provide input and output files implemented in the R script.

  4. d

    Lidar - ND Halo Scanning Doppler, Boardman - Raw Data

    • catalog.data.gov
    • data.openei.org
    • +2more
    Updated Apr 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2022). Lidar - ND Halo Scanning Doppler, Boardman - Raw Data [Dataset]. https://catalog.data.gov/dataset/lidar-hilflows-llnl-zephir300-mop-processed-data
    Explore at:
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview The University of Notre Dame (ND) scanning lidar dataset used for the WFIP2 Campaign is provided. The raw dataset contains the radial velocity and backscatter measurements along with the beam location and other lidar parameters in the header. Data Details 1) A Halo photonics scanning lidar, owned by ND, was deployed and operated from 12/17/2015 to 02/09/2016. On 02/09/2016, this lidar was replaced by a Halo photonics scanning lidar owned by the Army Research Lab (ARL). 2) For information on the scanning patterns, refer to attached "ReadMe" file. 3) Data Period from 12/15/2015 to 02/09/2016: One data file per day (24 hours). File name of each daily data file has {boardman} as {optionalfields}. For example: lidar.z07.00.20150414.143000.boardman.csm. 4) Data Period after 02/09/2016: One scan file every 15 minutes, one stare file, and one background file every hour. File names have the following {optionalfields}: {background_boardman} for background files; {scan_boardman} for scan files; and {stare_boardman} for stare files. For example: - lidar.z07.00.20150414.143000.background_boardman - lidar.z07.00.20150414.143000.scan_boardman - lidar.z07.00.20150414.143000.stare_boardman 5) Site information: - Site: Boardman, OR - Latitude: 45.816185° N - Longitude: 119.811766° W - Elevation (meters): 112.0 Data Quality Raw data: no quality control (QC) is applied. Uncertainty The lidar measurements' uncertainty varies with the range of the measurements. Please refer to Pearson et al. (2009) for more details. Constraints 1) Because of the change of lidars, the data were downloaded in different formats. Hence, the raw data (unfiltered) primarily are in two formats: .csm and .hpl. 2) The data were downloaded every one hour or 15 minutes. Hence, the datasets are not concatenated for continuous scans. 3) A lidar offset of +195 deg (to True North) was added to the azimuthal angles from the ND scanning lidars, spanning 12/17/2015 until 02/09/2016. Later, this was corrected for the data from 02/09/2016 as the lidar aligned to True North.

  5. d

    Surface Meteorological Station - PNNL 10m Sonic, Physics site-5 - Raw Data

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Aug 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2021). Surface Meteorological Station - PNNL 10m Sonic, Physics site-5 - Raw Data [Dataset]. https://catalog.data.gov/dataset/surface-meteorological-station-esrl-short-tower-condon-reviewed-data
    Explore at:
    Dataset updated
    Aug 7, 2021
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview This dataset provides fast response wind and virtual sonic temperature data. Data Details Each meteorological (met) station has one sonic anemometer (Gill R3-50, omnidirectional) mounted on top of a 10-m tower. Sensor verticality (within a degree) has been verified by the analog inclinometer mounted on the base plate alongside the sonic anemometer. The sonic anemometer has been oriented to magnetic North. The serial data stream is transmitted via radio link (9XTend RF modem by MaxStream) to the data acquisition computer housed in a temperature-controlled enclosure at the base of the 80-m tower. The original data were stored in flat ASCII files in 30-min pieces (".00." level). The current version of the data is ".a0." level. All evidently erroneous and/or broken lines were marked as bad and/or replaced with a "baddata" place holder, the housekeeping data were stripped off, and the data were split into 5-min portions with no internal time stamp. The data have been prepared for processing with EddyPro and stored in ASCII comma delimited files formatted as follows: u,v,w,T,qc where: "u, v, w" are the three wind components (m/s) "T" is the sonic virtual temperature (C) "qc" is basic quality control code: 0 - OK, 1 - sonic bad data code, 2 - broken data line, and 3 - missed line. Baddata place holder is 99.99 NOTE: No attempt has been made to fill gaps in the data. Data Quality Includes raw data with basic quality control (QC) applied. All housekeeping fields removed.

  6. Pyrosequencing metrics from raw data before and after standard MG-RAST...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcio C. Costa; Luis G. Arroyo; Emma Allen-Vercoe; Henry R. Stämpfli; Peter T. Kim; Amy Sturgeon; J. Scott Weese (2023). Pyrosequencing metrics from raw data before and after standard MG-RAST quality control (QC) filters. [Dataset]. http://doi.org/10.1371/journal.pone.0041484.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Marcio C. Costa; Luis G. Arroyo; Emma Allen-Vercoe; Henry R. Stämpfli; Peter T. Kim; Amy Sturgeon; J. Scott Weese
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Total number of sequences, number of base pairs and the mean length of sequences (bp) present in the original fasta file before and after MG-RAST standard quality control filters. Means and standard deviations (±SD) among healthy horses and horses with colitis are also presented.

  7. f

    Details of raw data and quality control used for assembly of the H. avenae...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 6, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shukla, Rohit N.; Thakur, Prasoon Kumar; Jones, Michael G. K.; Gantasala, Nagavara Prasad; Kumar, Mukesh; Rao, Uma; Roychowdhury, Tanmoy; Banakar, Prakash (2014). Details of raw data and quality control used for assembly of the H. avenae transcriptome. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001220693
    Explore at:
    Dataset updated
    May 6, 2014
    Authors
    Shukla, Rohit N.; Thakur, Prasoon Kumar; Jones, Michael G. K.; Gantasala, Nagavara Prasad; Kumar, Mukesh; Rao, Uma; Roychowdhury, Tanmoy; Banakar, Prakash
    Description

    Details of raw data and quality control used for assembly of the H. avenae transcriptome.

  8. Additional file 5 of seqQscorer: automated quality control of...

    • springernature.figshare.com
    xlsx
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steffen Albrecht; Maximilian Sprang; Miguel A. Andrade-Navarro; Jean-Fred Fontaine (2023). Additional file 5 of seqQscorer: automated quality control of next-generation sequencing data using machine learning [Dataset]. http://doi.org/10.6084/m9.figshare.14173812.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Steffen Albrecht; Maximilian Sprang; Miguel A. Andrade-Navarro; Jean-Fred Fontaine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 5. Table of models tuned using the grid search. Dataset: species-subset-layout, generic: all data. Feature sets: RAW (raw data), MAP (genome mapping), LOC (genomic localization), TSS (transcription start sites profile). Feature Selection: method-percentage (percentage of retained features), chi-square (chi2), recursive feature elimination (RFE). Algorithm Parameters: relevant to a scikit-learn implementation.

  9. Partial raw data or bioinformatics analysis result.txt

    • figshare.com
    txt
    Updated Nov 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ma zhiming; Haiying Fu (2021). Partial raw data or bioinformatics analysis result.txt [Dataset]. http://doi.org/10.6084/m9.figshare.16902844.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 1, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    ma zhiming; Haiying Fu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Text files for detailed patients’ clinical characteristics, quality control of each sample, and detected gene mutation points and fusion genes of samples

  10. B

    Peat chemistry: Raw data and quality control, Seba Beach peat samples used...

    • borealisdata.ca
    • search.dataone.org
    Updated Sep 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aneta Bieniada (2020). Peat chemistry: Raw data and quality control, Seba Beach peat samples used for microbial analyses (a). Peat chemistry: Raw data and quality control, Seba Beach peat samples used for microcosms (b). [Dataset]. http://doi.org/10.5683/SP2/R6FCJV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2020
    Dataset provided by
    Borealis
    Authors
    Aneta Bieniada
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Seba Beach
    Description

    Peat samples collected in summer 2016 and 2017 at Seba Beach horticultural peat complex in Alberta. Peat chemistry: Raw data and quality control, Seba Beach peat samples used for microbial analyses (a). Peat chemistry: Raw data and quality control, Seba Beach peat samples used for microcosms (b).

  11. u

    Sorel, QC OTT Parsivel2 Disdrometer Data

    • data.ucar.edu
    • ckanprod.data-commons.k8s.ucar.edu
    ascii
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Michelson (2025). Sorel, QC OTT Parsivel2 Disdrometer Data [Dataset]. http://doi.org/10.26023/2ATQ-PPKD-D80W
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Daniel Michelson
    Time period covered
    Oct 26, 2021 - May 3, 2022
    Area covered
    Description

    This dataset contains the raw data from the OTT Parsivel2 laser disdrometer sited at Sorel-Tracy, QC for the Winter Precipitation Type Research Mult-Scale Experiment (WINTRE-MIX). Data was collected between October 20, 2021 and May 03, 2022. The Parsivel utilizes a horizontal beam of light to detect particle sizes and fall speeds and deduce precipitation types, accumulations and visibility. Sorel-Tracy is located at the northern end of the Champlain Valley, on the southern shore of the St. Lawrence River northeast of Montreal. The site was in a small park to the east of the Richelieu River. Several other instruments were also stationed at the site and will be available from the WINTRE-MIX data archive.

  12. d

    Radar - ANL Wind Profiler with RASS, Yakima - Raw Data

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2022). Radar - ANL Wind Profiler with RASS, Yakima - Raw Data [Dataset]. https://catalog.data.gov/dataset/microwave-radiometer-esrl-radiometrics-mwr-wasco-airport-raw-data
    Explore at:
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview Winds A radar wind profiler measures the Doppler shift of electromagnetic energy scattered back from atmospheric turbulence and hydrometeors along 3-5 vertical and off-vertical point beam directions. Back-scattered signal strength and radial-component velocities are remotely sensed along all beam directions and combined to derive the horizontal wind field over the radar. These data typically are sampled and averaged hourly and usually have 6-m and/or 100-m vertical resolutions up to 4 km for the 915 MHz and 8 km for the 449 MHz systems. Temperature To measure atmospheric temperature, a radio acoustic sound system (RASS) is used in conjunction with the wind profile. These data typically are sampled and averaged for five minutes each hour and have a 60-m vertical resolution up to 1.5 km for the 915 MHz and 60-m up to 3.5k m for the 449 MHz. Data Details Spectra data are stored in two daily files, a header (file names contain "H") and a data (file names contain "D") file. The (H)eader files are made up of binary data records containing information about the operational parameters of the profiler, while (D)ata files, also composed of binary data records, contain the spectra data collected by the profiler, i.e. spectral values for each spectral bin for every range gate. Data Quality Various quality control (QC) algorithms developed over the years process data in real time on the radar software layer. These algorithms, which run in real time, act on time-series, spectra, moment, and consensus data layers that are persisted in various forms. For a detailed description, refer to the attached QC document: 915 and 449 MHz Radar Wind Profilers and RASS QC. Uncertainty The uncertainty is defined by the spacing of the spectral bin.

  13. H

    Hydroinformatics Instruction Module Example Code: Sensor Data Quality...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Mar 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2022). Hydroinformatics Instruction Module Example Code: Sensor Data Quality Control with pyhydroqc [Dataset]. https://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924
    Explore at:
    zip(159.5 MB)Available download formats
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

    This resources consists of 3 example notebooks and associated data files.

    Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA)

    Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

    For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').

  14. g

    Sodar - ANL Wind Profiler, Yakima - Raw Data | gimi9.com

    • gimi9.com
    Updated Oct 6, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sodar - ANL Wind Profiler, Yakima - Raw Data | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_shortwave-longwave-radiometer-esrl-radsys-rufus-reviewed-data
    Explore at:
    Dataset updated
    Oct 6, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview To provide lower-level wind speed and direction measurements. Data Quality Raw data are plotted, visually inspected on a daily basis, and uploaded to DAP every 15 minutes. Manual QA/QC procedures will be applied to the data for apparent outliers/physically impossible values. QC performed data will be uploaded to DAP by the deadline, "qc" identifier appended to the file names.

  15. c

    Radar - ARL Wind Profilerwith RASS, Boardman - Raw Data

    • s.cnmilf.com
    • data.openei.org
    • +4more
    Updated Apr 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2022). Radar - ARL Wind Profilerwith RASS, Boardman - Raw Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/provider-university-of-washington-raw-data
    Explore at:
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview Winds A radar wind profiler measures the Doppler shift of electromagnetic energy scattered back from atmospheric turbulence and hydrometeors along 3-5 vertical and off-vertical point beam directions. Back-scattered signal strength and radial-component velocities are remotely sensed along all beam directions and combined to derive the horizontal wind field over the radar. These data typically are sampled and averaged hourly and usually have 6-m and/or 100-m vertical resolutions up to 4 km for the 915 MHz and 8 km for the 449 MHz systems. Temperature To measure atmospheric temperature, a radio acoustic sound system (RASS) is used in conjunction with the wind profile. These data typically are sampled and averaged for five minutes each hour and have a 60-m vertical resolution up to 1.5 km for the 915 MHz and 60-m up to 3.5k m for the 449 MHz. Data Quality Various quality control (QC) algorithms developed over the years process data in real time on the radar software layer. These algorithms, which run in real time, act on time-series, spectra, moment, and consensus data layers that are persisted in various forms. For a detailed description, refer to the attached QC document: 915 and 449 MHz Radar Wind Profilers and RASS QC.

  16. Torque magnetometry AMS data, Raw Data and Data Quality Control...

    • zenodo.org
    bin, pdf
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clémentine Hamelin; Clémentine Hamelin (2024). Torque magnetometry AMS data, Raw Data and Data Quality Control Visualizations, Entia Dome Amphibolite samples (2017 sample collection, unoriented) [Dataset]. http://doi.org/10.5281/zenodo.14537696
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Clémentine Hamelin; Clémentine Hamelin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-Field AMS Torque Magnetometry data (Raw Data) and data quality control visualizations plots associated with JGR Solid Earth manuscript entitled "Magnetic and Crystallographic Fabric Analyses of Amphibolite: A Proposed Methodology Applied to a Migmatite Dome".

  17. n

    Data from: A systematic evaluation of normalization methods and probe...

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Universidade de São Paulo
    Hospital for Sick Children
    University of Toronto
    Authors
    H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
    Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
    Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). Methods

    Study Participants and Samples

    The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.

    All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.

    Blood Collection and Processing

    Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.

    Characterization of DNA Methylation using the EPIC array

    Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).

    Processing and Analysis of DNA Methylation Data

    The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.

    Normalization Methods Evaluated

    The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.

  18. Z

    Raw data used in AC. Raclariu-Manolica et al. "DNA metabarcoding for quality...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AC Raclariu-Manolica; JA Anmarkrud; M Kierczak; N Rafati; L Thorbek; A Schrøder-Nielsen; HJ De Boer (2021). Raw data used in AC. Raclariu-Manolica et al. "DNA metabarcoding for quality control of basil, oregano and paprika." [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730043
    Explore at:
    Dataset updated
    May 1, 2021
    Dataset provided by
    University of Oslo
    Authors
    AC Raclariu-Manolica; JA Anmarkrud; M Kierczak; N Rafati; L Thorbek; A Schrøder-Nielsen; HJ De Boer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data used in:

    AC. Raclariu-Manolica, JA Anmarkrud, M Kierczak, N Rafati, L Thorbek, A Schrøder-Nielsen and HJ De Boer. "DNA metabarcoding for quality control of basil, oregano and paprika."

    Files:

    1/ mapping.txt - mapping between the

    • file.fastq.gz--barcode pair

      and the

    • sample-replicate pair

    2/ barcodes.txt -- file contains barcode sequences for demultiplexing reads

    3/ primers.txt contains forward and reverse primer sequences for ITS1 and ITS2

  19. u

    Data from: Fiscal Year 2020 Supplemental Nutrition Assistance Program...

    • agdatacommons.nal.usda.gov
    txt
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kathryn Cronquist; Brett Eiffes; Natalie Reid; Mia Monkovic (2025). Fiscal Year 2020 Supplemental Nutrition Assistance Program Quality Control Database [Dataset]. http://doi.org/10.15482/USDA.ADC/1528542
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Kathryn Cronquist; Brett Eiffes; Natalie Reid; Mia Monkovic
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The Supplemental Nutrition Assistance Program (SNAP) is the largest of the domestic nutrition assistance programs administered by the Food and Nutrition Service (FNS) of the U.S. Department of Agriculture (USDA), providing millions of Americans with the means to purchase food for a nutritious diet. During fiscal year (FY) 2020, SNAP served an average of 39.9 million people monthly and paid out $74.2 billion in benefits, which includes the cost of emergency allotments to supplement SNAP benefits due to the COVID-19 public health emergency. In response to legislative adjustments to program rules and changes in economic and demographic trends, the characteristics of SNAP participants and households and the size of the SNAP caseload change over time. To quantify these changes or estimate the effect of adjustments to program rules on the current SNAP caseload, FNS relies on data from the SNAP Quality Control (QC) database. This database is an edited version of the raw data file of monthly case reviews conducted by State SNAP agencies to assess the accuracy of eligibility determinations and benefit calculations for each State’s SNAP caseload. The COVID-19 public health emergency resulted in an incomplete FY 2020 sample in the raw data file. FNS granted States temporary waivers on conducting QC reviews starting in March 2020. Very few States collected QC data from March 2020 through May 2020. Most States opted to conduct QC reviews from June 2020 through September 2020, although FNS was unable to provide its usual level of oversight of the sampling procedures. Furthermore, monthly State samples for this time period were often smaller than usual. This dataset includes separate SNAP QC files for FY 2020. The first covers the “pre-pandemic” period of October 2019 through February 2020. The second covers the “waiver” period of June 2020 through September 2020 for the 47 States and territories that provided sufficient data for at least one of those months. Resources in this dataset:Resource Title: Fiscal Year 2020 Supplemental Nutrition Assistance Program Quality Control Database (Period 2). File Name: qc_pub_fy2020_per2.csvResource Description: The Supplemental Nutrition Assistance Program (SNAP) is the largest of the domestic nutrition assistance programs administered by the Food and Nutrition Service (FNS) of the U.S. Department of Agriculture (USDA), providing millions of Americans with the means to purchase food for a nutritious diet. During fiscal year (FY) 2020, SNAP served an average of 39.9 million people monthly and paid out $74.2 billion in benefits, which includes the cost of emergency allotments to supplement SNAP benefits due to the COVID-19 public health emergency. In response to legislative adjustments to program rules and changes in economic and demographic trends, the characteristics of SNAP participants and households and the size of the SNAP caseload change over time. To quantify these changes or estimate the effect of adjustments to program rules on the current SNAP caseload, FNS relies on data from the SNAP Quality Control (QC) database. This database is an edited version of the raw data file of monthly case reviews conducted by State SNAP agencies to assess the accuracy of eligibility determinations and benefit calculations for each State’s SNAP caseload.

    The COVID-19 public health emergency resulted in an incomplete FY 2020 sample in the raw data file. FNS granted States temporary waivers on conducting QC reviews starting in March 2020. Very few States collected QC data from March 2020 through May 2020. Most States opted to conduct QC reviews from June 2020 through September 2020, although FNS was unable to provide its usual level of oversight of the sampling procedures. Furthermore, monthly State samples for this time period were often smaller than usual.

    There are separate SNAP QC databases for FY 2020. The first covers the “pre-pandemic” period of October 2019 through February 2020. The second covers the “waiver” period of June 2020 through September 2020 for the 47 States and territories that provided sufficient data for at least one of those months.Resource Title: Fiscal Year 2020 Supplemental Nutrition Assistance Program Quality Control Database (Period 1). File Name: qc_pub_fy2020_per1.csvResource Description: The Supplemental Nutrition Assistance Program (SNAP) is the largest of the domestic nutrition assistance programs administered by the Food and Nutrition Service (FNS) of the U.S. Department of Agriculture (USDA), providing millions of Americans with the means to purchase food for a nutritious diet. During fiscal year (FY) 2020, SNAP served an average of 39.9 million people monthly and paid out $74.2 billion in benefits, which includes the cost of emergency allotments to supplement SNAP benefits due to the COVID-19 public health emergency. In response to legislative adjustments to program rules and changes in economic and demographic trends, the characteristics of SNAP participants and households and the size of the SNAP caseload change over time. To quantify these changes or estimate the effect of adjustments to program rules on the current SNAP caseload, FNS relies on data from the SNAP Quality Control (QC) database. This database is an edited version of the raw data file of monthly case reviews conducted by State SNAP agencies to assess the accuracy of eligibility determinations and benefit calculations for each State’s SNAP caseload.

    The COVID-19 public health emergency resulted in an incomplete FY 2020 sample in the raw data file. FNS granted States temporary waivers on conducting QC reviews starting in March 2020. Very few States collected QC data from March 2020 through May 2020. Most States opted to conduct QC reviews from June 2020 through September 2020, although FNS was unable to provide its usual level of oversight of the sampling procedures. Furthermore, monthly State samples for this time period were often smaller than usual.

    There are separate SNAP QC databases for FY 2020. The first covers the “pre-pandemic” period of October 2019 through February 2020. The second covers the “waiver” period of June 2020 through September 2020 for the 47 States and territories that provided sufficient data for at least one of those months.Resource Title: Technical Documentation for the Fiscal Year 2020 Supplemental Nutrition Assistance Program Quality Control Database and the QC Minimodel. File Name: FY2020TechDoc.pdfResource Description: The Supplemental Nutrition Assistance Program (SNAP) is the largest of the domestic nutrition assistance programs administered by the Food and Nutrition Service (FNS) of the U.S. Department of Agriculture (USDA), providing millions of Americans with the means to purchase food for a nutritious diet. During fiscal year (FY) 2020, SNAP served an average of 39.9 million people monthly and paid out $74.2 billion in benefits, which includes the cost of emergency allotments to supplement SNAP benefits due to the COVID-19 public health emergency. In response to legislative adjustments to program rules and changes in economic and demographic trends, the characteristics of SNAP participants and households and the size of the SNAP caseload change over time. To quantify these changes or estimate the effect of adjustments to program rules on the current SNAP caseload, FNS relies on data from the SNAP Quality Control (QC) database. This database is an edited version of the raw data file of monthly case reviews conducted by State SNAP agencies to assess the accuracy of eligibility determinations and benefit calculations for each State’s SNAP caseload.

    The COVID-19 public health emergency resulted in an incomplete FY 2020 sample in the raw data file. FNS granted States temporary waivers on conducting QC reviews starting in March 2020. Very few States collected QC data from March 2020 through May 2020. Most States opted to conduct QC reviews from June 2020 through September 2020, although FNS was unable to provide its usual level of oversight of the sampling procedures. Furthermore, monthly State samples for this time period were often smaller than usual.

    There are separate SNAP QC databases for FY 2020. The first covers the “pre-pandemic” period of October 2019 through February 2020. The second covers the “waiver” period of June 2020 through September 2020 for the 47 States and territories that provided sufficient data for at least one of those months.

  20. c

    Data from: Radar - ANL Wind Profiler with RASS, Goldendale - Raw Data

    • s.cnmilf.com
    • data.openei.org
    • +3more
    Updated Apr 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2022). Radar - ANL Wind Profiler with RASS, Goldendale - Raw Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/microwave-radiometer-esrl-radiometrics-mwr-troutdale-raw-data
    Explore at:
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview Winds A radar wind profiler measures the Doppler shift of electromagnetic energy scattered back from atmospheric turbulence and hydrometeors along 3-5 vertical and off-vertical point beam directions. Back-scattered signal strength and radial-component velocities are remotely sensed along all beam directions and combined to derive the horizontal wind field over the radar. These data typically are sampled and averaged hourly and usually have 6-m and/or 100-m vertical resolutions up to 4 km for the 915 MHz and 8 km for the 449 MHz systems. Temperature To measure atmospheric temperature, a radio acoustic sound system (RASS) is used in conjunction with the wind profile. These data typically are sampled and averaged for five minutes each hour and have a 60-m vertical resolution up to 1.5 km for the 915 MHz and 60-m up to 3.5k m for the 449 MHz. Data Details Spectra data are stored in two daily files, a header (file names contain "H") and a data (file names contain "D") file. The (H)eader files are made up of binary data records containing information about the operational parameters of the profiler, while (D)ata files, also composed of binary data records, contain the spectra data collected by the profiler, i.e. spectral values for each spectral bin for every range gate. Data Quality Various quality control (QC) algorithms developed over the years process data in real time on the radar software layer. These algorithms, which run in real time, act on time-series, spectra, moment, and consensus data layers that are persisted in various forms. For a detailed description, refer to the attached QC document: 915 and 449 MHz Radar Wind Profilers and RASS QC. Uncertainty The uncertainty is defined by the spacing of the spectral bin.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kadobianskyi, Mykola; Schulze, Lisanne; Judkewitz, Benjamin; Schuelke, Markus (2019). Raw seq data quality control [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000115706

Raw seq data quality control

Explore at:
Dataset updated
Aug 12, 2019
Authors
Kadobianskyi, Mykola; Schulze, Lisanne; Judkewitz, Benjamin; Schuelke, Markus
Description

ZIP archive with FastQC-generated quality reports for the short-read libraries used in the assembly and annotation

Search
Clear search
Close search
Google apps
Main menu