70 datasets found

b
FlowRepository
bioregistry.io
Updated Apr 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). FlowRepository [Dataset]. http://identifiers.org/re3data:r3d100011280
Explore at:
Unique identifier
https://identifiers.org/re3data:r3d100011280
Dataset updated
Apr 30, 2021
Description
FlowRepository is a database of flow cytometry experiments where you can query and download data collected and annotated according to the MIFlowCyt standard. It is primarily used as a data deposition place for experimental findings published in peer-reviewed journals in the flow cytometry field.
d
FLOWRepository
dknet.org
neuinfo.org
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). FLOWRepository [Dataset]. http://identifiers.org/RRID:SCR_013779
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013779
Dataset updated
Aug 14, 2024
Description
A database of flow cytometry experiments where users can query and download data collected and annotated according to the MIFlowCyt data standard.
Downsampled data from FlowRepository: FR-FCM-Z3WR
figshare.com
csv
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Tyrrell (2024). Downsampled data from FlowRepository: FR-FCM-Z3WR [Dataset]. http://doi.org/10.6084/m9.figshare.27940719.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27940719.v1
Dataset updated
Dec 2, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Daniel Tyrrell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Spectral flow cytometry provides greater insights into cellular heterogeneity by simultaneous measurement of up to 50 markers. However, analyzing such high-dimensional (HD) data is complex through traditional manual gating strategy. To address this gap, we developed CAFE as an open-source Python-based web application with a graphical user interface. Built with Streamlit, CAFE incorporates libraries such as Scanpy for single-cell analysis, Pandas and PyArrow for efficient data handling, and Matplotlib, Seaborn, Plotly for creating customizable figures. Its robust toolset includes density-based down-sampling, dimensionality reduction, batch correction, Leiden-based clustering, cluster merging and annotation. Using CAFE, we demonstrated analysis of a human PBMC dataset of 350,000 cells identifying 16 distinct cell clusters. CAFE can generate publication-ready figures in real time via interactive slider controls and dropdown menus, eliminating the need for coding expertise and making HD data analysis accessible to all. CAFE is licensed under MIT and is freely available at https://github.com/mhbsiam/cafe.
Flow Cytometry Bioinformatics
plos.figshare.com
datasetcatalog.nlm.nih.gov
html
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kieran O'Neill; Nima Aghaeepour; Josef Špidlen; Ryan Brinkman (2023). Flow Cytometry Bioinformatics [Dataset]. http://doi.org/10.1371/journal.pcbi.1003365
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1003365
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Kieran O'Neill; Nima Aghaeepour; Josef Špidlen; Ryan Brinkman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing, and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results. Computational methods exist to assist in the preprocessing of flow cytometry data, identifying cell populations within it, matching those cell populations across samples, and performing diagnosis and discovery using the results of previous steps. For preprocessing, this includes compensating for spectral overlap, transforming data onto scales conducive to visualization and analysis, assessing data for quality, and normalizing data across samples and experiments. For population identification, tools are available to aid traditional manual identification of populations in two-dimensional scatter plots (gating), to use dimensionality reduction to aid gating, and to find populations automatically in higher dimensional space in a variety of ways. It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating. Finally, diagnosis using flow cytometry data can be aided by supervised learning techniques, and discovery of new cell types of biological importance by high-throughput statistical methods, as part of pipelines incorporating all of the aforementioned methods.Open standards, data, and software are also key parts of flow cytometry bioinformatics. Data standards include the widely adopted Flow Cytometry Standard (FCS) defining how data from cytometers should be stored, but also several new standards under development by the International Society for Advancement of Cytometry (ISAC) to aid in storing more detailed information about experimental design and analytical steps. Open data is slowly growing with the opening of the CytoBank database in 2010 and FlowRepository in 2012, both of which allow users to freely distribute their data, and the latter of which has been recommended as the preferred repository for MIFlowCyt-compliant data by ISAC. Open software is most widely available in the form of a suite of Bioconductor packages, but is also available for web execution on the GenePattern platform.
Single-cell datasets for distribution-based sketching
zenodo.org
bin
Updated May 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal Athreya Baskaran; Jolene Ranek; Siyuan Shan; Natalie Stanley; Junier Oliva; Vishal Athreya Baskaran; Jolene Ranek; Siyuan Shan; Natalie Stanley; Junier Oliva (2022). Single-cell datasets for distribution-based sketching [Dataset]. http://doi.org/10.5281/zenodo.6546964
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6546964
Dataset updated
May 14, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vishal Athreya Baskaran; Jolene Ranek; Siyuan Shan; Natalie Stanley; Junier Oliva; Vishal Athreya Baskaran; Jolene Ranek; Siyuan Shan; Natalie Stanley; Junier Oliva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains preprocessed single-cell data for sketching single-cell samples. Preprocessed adata objects can be accessed using the 'read_h5ad' function in Scanpy.

The HIV Vaccine Trials Network (HVTN) Flow Cytometry dataset (hvtn_preprocessed.h5ad) was originally downloaded from the Flow Repository under Repository ID FR-FCM-ZZZV (http://flowrepository.org/id/FR-FCM-ZZZV).

The preeclampsia CyTOF dataset (preeclampsia_preprocessed.h5ad) was originally downloaded from the Flow Repository under Repository ID FR-FCM-ZYRQ (http://flowrepository.org/id/FR-FCM-ZYRQ).

The NK-Cell CyTOF dataset (nk_cell_preprocessed.h5ad) from Ref. (https://www.nature.com/articles/ncomms14825) was originally downloaded from (https://github.com/eiriniar/CellCnn).

The multiple sclerosis (MS) single-cell RNA sequencing dataset of peripheral blood samples (ms_preprocessed.h5ad) from Ref. (https://www.nature.com/articles/s41467-019-14118-w) was originally accessed from the Gene Expression Omnibus using the accession code GSE138266 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138266).
Data from: Stochastic Regression and Peak Delineation with Flow Cytometry...
catalog.data.gov
datasets.ai
+1more
Updated Mar 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). Stochastic Regression and Peak Delineation with Flow Cytometry Data [Dataset]. https://catalog.data.gov/dataset/stochastic-regression-and-peak-delineation-with-flow-cytometry-data
Explore at:
Dataset updated
Mar 12, 2024
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This data repository contains original files (fcs) of flow cytometry experiments. The data was used to demonstrate the use of stochastic regression to quantify subpopulations of cells that have distinctly different genome copies per cell within a heterogenous population of Escherichia coli (E. coli) cells. This new approach gives estimates of signal and noise, the former of which is used for analysis, and the latter is used to quantify uncertainty. By separating these two components, the signal and noise can be compared independently to evaluate measurement quality across different experimental conditions. The files contain experiments from a single stock of Escherichia coli cells that was diluted to different concentrations, stained with Hoechst33342, and acquired on a CytoFLEX LX under the same acquisition conditions. ?Control_Hoechst? is a biologic control sample stained only with Hoechst. ?RainbowBeads? is a control of hard-dyed fluorescent beads with 8 distinct peaks of known fluorescent intensities per manufacturer documentation. ?Test_double? indicates test samples with double fluorescent probe staining, the fractional number (e.g. 0.7) indicates the dilution factor from the stock, and the integer at the end represents the technical replicate.The downloaded Exp_20230921_1_Cyto-A-journal.zip file contains 14 files in .fcs format, which requires suitable software to read/analyze data (i.e., FCS Express).
Cross-platform cytometry benchmark data
zenodo.org
Updated Jan 1, 2028
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunther Glehr; Gunther Glehr; James A Hutchinson; James A Hutchinson (2028). Cross-platform cytometry benchmark data [Dataset]. http://doi.org/10.5281/zenodo.17094078
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.17094078
Dataset updated
Jan 1, 2028
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gunther Glehr; Gunther Glehr; James A Hutchinson; James A Hutchinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 10, 2025
Measurement technique
<h2>Overall methods</h2> <h3>Patients and Ethics</h3> <p>Blood samples were collected from healthy donors at the University Hospital Regensburg, Germany, the University Hospital Marqués de Valdecilla, Santander, Spain, and Germans Trias I Pujol Research Institute (IGTP), Badalona, Spain. The study was conducted in multiple phases: R1 and R2 in Regensburg (July 18, 2023-January 23, 2024, and May 15, 2024-July 16, 2024), SAN in Santander (April 29, 2024-May 28, 2024), FORT in Regensburg (January 14, 2025 - January 24, 2025) and BAD in Badalona (March 18, 2025 - March 19, 2025). The study was approved by the Ethics Committees of the University of Regensburg (22-2780-101), Hospital Universitario Marqués de Valdecilla (CS24-116; 2024_6) and IGTP (PI-23-272). The study was conducted in accordance with the principles of the Declaration of Helsinki and all other relevant national and international laws and guidelines. All donors provided written, fully informed consent to sample collection and publication of anonymized results. Complete descriptions of each cohort (Supplementary Note 1) and clinical investigations (Supplementary Note 2) are provided as Supplementary Material.</p> <h3>Flow cytometry measurements</h3> <p>Step-by-step protocols can be accessed at Protocol Exchange. Briefly, blood was collected into EDTA-vacutainers by peripheral venepuncture and then delivered to the responsible lab at ambient temperature. Samples were stored at 4°C for up to 4h before processing. Whole blood samples were stained as previously described using the DURAClone IM T Cell Subsets Tube (Beckman Coulter, B53328) and single-staining controls. DURAClone IM T cell subsets compensation tubes were run every two weeks. Daily quality control (QC) checks were run on all cytometers using Flow-Check Pro Fluorospheres (Beckman Coulter, A63493) to ensure proper function.</p> <p>Data were collected in Regensburg using a Navios™ cytometer running Navios™ Cytometry List Mode Acquisiton Analysis Software, Version 1.3 (Beckman Coulter) or a CytoFLEX LX™ cytometer running CytExpert, Version 2.4.0.28 (Beckman Coulter) or a BD LSRFortessa X-20' running BD FACSDiva software v9.0. In Santander, data were collected using a DxFLEX cytometer running CytExpert for DxFLEX, Version 2.2.0.7. In Badalona, data were collected using both a Cytek Aurora™ 5L spectral flow cytometer running SpectroFlo® software (Cytek Biosciences), Version 3.1.0, or a BD LSRFortessa 4L flow cytometer running BD FACSDiva™Software (BD Biosciences), Version 6.2. Settings for each cytometer were established by independent experienced operators without exchange of reference samples, calibration materials or example data.</p> <p>Data were pre-processed by a single experienced, blinded operator who performed: (1) sample-wise manual recompensation; (2) manual gating; and (3) rescaling with a suitable arcsinh cofactor. Data were then exported as FCS files for upload to the ImmPort repository (Accession_ID). These FCS files contain the uncompensated data and three compensations: 1) single-stain compensation for each sample, 2) compensation from DURAClone IM T Cell Subsets compensation tubes, 3) manually recompensated single-stain compensation for each sample. The manual gatings per sample are provided as FlowWorkspace gating sets in h5 format and should be applied to the manually recompensated data. An example gating strategy is provided. In addition, we provide manually recompensated, pre-gated T cell FCS files for all fully stained samples.</p> <h3>Flow cytometry analyses</h3> <p>All computations were performed in R using the Bioconductor packages flowCore and flowWorkspace, alongside our own convenience tools, cytobench , cycompare and otcyto. We gathered FCS files from all instruments, and carried out pre-processing, clustering, classification, and optimal transport analyses.</p> <p>Pre-processing included applying compensation (or unmixing in the case of spectral cytometry), manual gating to identify CD45+ CD3+ singlet T cells in panel samples and lymphocytes in single-stained controls, arcsinh transformation of fluorescence intensities using manually selected cofactors, random cell subsampling to standardise sample sizes and optionally CytoNorm for normalisation. Alternatively, we applied data relativisation as an alignment strategy before arcsinh transformation.</p> <p>For reproducibility, R1 and SAN donors were randomly assigned to training, validation, and test sets only once. This donor-level split was preserved for all downstream analyses and serves as the foundation for model development and evaluation. Ideally, future benchmarking efforts using our data should report on those splits for direct comparability of results.</p>
Description
This repository contains the data presented in our original article, "Superior Precision of Clinical Predictions after CD3-relativisation to Align Flow Cytometry Data," including raw and processed FCS files from 482 samples. This dataset captures information about T cell distributions in human healthy donors through standardised flow cytometry measurements made in four internationally collaborating laboratories over 17 months using 6 different cytometers. A subset of 329 samples split for parallel measurements on at least 2 instruments. This repository also reports donor-level clinical and demographic information, as well as QC files.

If you want to start analysing the data, we propose you download CORE.06-1_rel.asinhCD3 which are CD3-relativised and properly arcsinh transformed data of the CORE studies R1, R2 and SAN. In addition, download patientdata.zip, where all anonymized patient information necessary is reported.

See technical note below if you want to use CORE.02-COMPENSATED.7z!

Cohort overview

CORE:

Our core dataset comprises 459 samples from 358 unique donors, which are organized into three main cohorts (R1, R2 and SAN). R1 includes 254 samples from 153 unique donors, which were analysed in Regensburg. Clinical and demographic variables are recorded. Within R1, 101 repeated samples were collected from 60 donors to test the biological stability of T cell subset distributions over time. The first samples taken from each R1 donor were randomised into training (50), validation (50) and test (53) sets. R2 is a prospective dataset collected 4 months after R1 that includes 52 samples from unique donors, who are not represented in R1. R1 and R2 samples were split after staining and measured in parallel with a Navios™ (Navios) and CytoFLEX LX™ (LX) cytometer. SAN comprises 153 samples from unique donors that were measured with a DxFlex™ (DxFLEX) cytometer in Santander. As with R1, SAN samples were randomised into training (50), validation (50) and test (53) sets.

EXTENDED:

Our extended dataset incorporates two cohorts, FORT and BAD. FORT comprises 14 samples from unique donors that were split into equal parts, then measured in parallel using 3 cytometers – namely, a Navios™ and CytoFLEX LX™ from Beckman Coulter, and an LSRFortessa™ (LSR) from Becton Dickenson. The BAD cohort incorporates samples from 9 unique donors that were split after staining and measured in parallel using an LSRFortessa™ and a Cytek Aurora™ (CA) spectral cytometer.

Processing overview

For each cohort, we report (a subset of) the data in the following processing stages, denoted as e.g. CORE.01_RAW. The subsets -1 and -2 include different staining panels after gating them to "useful" cells, where -1 is the thing you probably want for analysis as these are the samples stained with all colors together gated to T cells based on CD3+.

01_RAW: Raw, untransformed data for full and single stained samples. Only marker renaming was performed to harmonize measurements from all cytometers.

02_COMPENSATED: Compensated FCS files, using by-sample manually curated spillover matrices. Only BAD cohort used untouched device-compensations.

03-1_gatedCD3: Singlets/CD45+ Leukocytes/CD3+ T cell gated samples. Only `...12-panel.fcs` are included.

03-2_gatedLympho: Lymphocyte gated samples based on forward and side scatter. All single stained and unstained samples (...01-CD3-FITC.fcs, until ...11-none.fcs)

04-1_asinhCD3 and 04-2_asinhLympho: Data from 03-1_gatedCD3 or 03-2_gatedLympho after arcsinh transformation, with different manually optimized arcsinh-cofactors per cytometer.

05-1_relativizedCD3 and 05-2_relativizedLympho: Data from 03-1_gatedCD3 or 03-2_gatedLympho after applying sample-wise relativisation.

06-1_rel.asinhCD3 and 06-2_rel.asinhLympho: Data from 05-1_relativizedCD3 and 05-2_relativizedLympho after applying one global arcsinh transformation

Patient information

Can be found in patientdata.zip.

Content is the patient information of

BAD_pheno_processed.csv: BAD cohort

FORT_pheno_processed.csv: FORT cohort

pheno_full_processed.csv: Complete CORE cohort

R1_pheno_first_processed.csv: R1 samples from a donor's first presentation

R1_pheno_processed.csv: All R1 samples+patients

R2_pheno_processed.csv: R2 cohort

SAN_pheno_processed.csv: SAN cohort

Gatings

For CORE and FORT cohort samples the gating strategies are manually curated for each sample on compensated, untransformed files - for full, single and unstained samples. They are supplied as flowWorkspace GatingSets and we have applied them sample by sample. The gating strategy is always the same, just the gate positions have been curated.

For BAD samples, we have one gating strategy for all samples per cytometer.
l
Supplementary Information Files for Current trends in flow cytometry...
repository.lboro.ac.uk
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melissa Cheung; Jonathan Campbell; Liam Whitby; Rob Thomas; Julian Braybrook; Jon Petzing (2023). Supplementary Information Files for Current trends in flow cytometry automated data analysis software [Dataset]. http://doi.org/10.17028/rd.lboro.15363474.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.17028/rd.lboro.15363474.v1
Dataset updated
May 30, 2023
Dataset provided by
Loughborough University
Authors
Melissa Cheung; Jonathan Campbell; Liam Whitby; Rob Thomas; Julian Braybrook; Jon Petzing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Information Files for Current trends in flow cytometry automated data analysis softwareAutomated flow cytometry (FC) data analysis tools for cell population identification and characterisation are increasingly being used in academic, biotechnology, pharmaceutical and clinical laboratories. Development of these computational methods are designed to overcome reproducibility and process bottleneck issues in manual gating, however the take-up of these tools remains (anecdotally) low.Here, we performed a comprehensive literature survey of state-of-the-art computational tools typically published by research, clinical, and biomanufacturing laboratories for automated FC data analysis and identified popular tools based on literature citation counts. Dimensionality reduction methods ranked highly, such as generic t-distributed stochastic neighbour embedding (t-SNE) and its initial Matlab based implementation for cytometry data viSNE. Software with graphical user interfaces also ranked highly, including PhenoGraph, SPADE1, FlowSOM and Citrus, with unsupervised learning methods outnumbering supervised learning methods, and algorithm type popularity spread across K-Means, hierarchical, density-based, model-based, and other classes of clustering algorithms.Additionally, to illustrate the actual use typically within clinical spaces alongside frequent citations, a survey issued by UK NEQAS Leucocyte Immunophenotyping to identify software usage trends among clinical laboratories was completed. The survey revealed 53% of laboratories have not yet taken up automated cell population identification methods, though amongst those that have, Infinicyt software is the most frequently identified. Survey respondents considered data output quality to be the most important factor when using automated FC data analysis software, followed by software speed and level of technical support.This review found differences in software usage between biomedical institutions, with tools for discovery, data exploration and visualisation more popular in academia, whereas automated tools for specialised targeted analysis that apply supervised learning methods were more used in clinical settings.
fsc_files_Fc7_Salmon
figshare.com
bin
Updated Nov 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madeleine Gundersen (2022). fsc_files_Fc7_Salmon [Dataset]. http://doi.org/10.6084/m9.figshare.21518922.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21518922.v1
Dataset updated
Nov 8, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Madeleine Gundersen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
fsc files and metadata for analysis

This repository contains the raw fsc files for the experiment investigating the effect of phage therapy and antibiotics on treating Flavobacterium columnare Fc7 infections in Atlantic salmon fry. The bacterial communities in the water were sampled (1 mL) right before and immediately after adding Fc7 and treatment. Throughout the experiment, the water was sampled after treatment and water exchanges were performed at 0.5, 1, 2, 4, 6, 8 and 10 DPI. The water samples were fixated through a 15-minute incubation in 0.1% Glutaraldehyde before being snap-frozen using liquid nitrogen. The bacterial density was quantified using flow cytometry (Attune NxT, ThermoFisher). Data were collected using the blue laser (488 nm) with detection in BL1 and BL3 using a BL1 threshold of 1500-3000 (depending on the sample). Instrument voltages were FSC 320V, SSC 260V, BL1 320 V and BL3 350V. Samples were acquired by running 160 uL sample at a 100 uL/min flowrate. Samples were vortexed before being sampled and the zip was cleaned between approximately every 6th sample. The water samples were stained with SYBR green I (2x) which emits a green fluorescent signal when it binds to DNA. Samples were incubated for 15 minutes in dark at 37 degrees with the stain before analysis. Samples were diluted in 0.2 um filtered PBS (1x) to obtain stable sample aquation and dilute background noise present in the sample. 0.2 um filtered PBS and 0.2 um filtered fish water were used as negative controls and the pure strains of Fc7 as positive controls.
l
Supplementary information files for Assessment of Automated Flow Cytometry...
repository.lboro.ac.uk
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melissa Cheung; Jonathan Campbell; Rob Thomas; Julian Braybrook; Jon Petzing (2023). Supplementary information files for Assessment of Automated Flow Cytometry Data Analysis Tools within Cell and Gene Therapy Manufacturing [Dataset]. http://doi.org/10.17028/rd.lboro.19794640.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.17028/rd.lboro.19794640.v1
Dataset updated
May 31, 2023
Dataset provided by
Loughborough University
Authors
Melissa Cheung; Jonathan Campbell; Rob Thomas; Julian Braybrook; Jon Petzing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary information files for article Assessment of Automated Flow Cytometry Data Analysis Tools within Cell and Gene Therapy Manufacturing

Flow cytometry is widely used within the manufacturing of cell and gene therapies to measure and characterise cells. Conventional manual data analysis relies heavily on operator judgement, presenting a major source of variation that can adversely impact the quality and predictive potential of therapies given to patients. Computational tools have the capacity to minimise operator variation and bias in flow cytometry data analysis; however, in many cases, confidence in these technologies has yet to be fully established mirrored by aspects of regulatory concern. Here, we employed synthetic flow cytometry datasets containing controlled population characteristics of separation, and normal/skew distributions to investigate the accuracy and reproducibility of six cell population identification tools, each of which implement different unsupervised clustering algorithms: Flock2, flowMeans, FlowSOM, PhenoGraph, SPADE3 and SWIFT (density-based, k-means, self-organising map, k-nearest neighbour, deterministic k-means, and model-based clustering, respectively). We found that outputs from software analysing the same reference synthetic dataset vary considerably and accuracy deteriorates as the cluster separation index falls below zero. Consequently, as clusters begin to merge, the flowMeans and Flock2 software platforms struggle to identify target clusters more than other platforms. Moreover, the presence of skewed cell populations resulted in poor performance from SWIFT, though FlowSOM, PhenoGraph and SPADE3 were relatively unaffected in comparison. These findings illustrate how novel flow cytometry synthetic datasets can be utilised to validate a range of automated cell identification methods, leading to enhanced confidence in the data quality of automated cell characterisations and enumerations.
Additional file 10 of Stable, fluorescent markers for tracking synthetic...
springernature.figshare.com
xlsx
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatriz Jorrin; Timothy L. Haskett; Hayley E. Knights; Anna Martyn; Thomas J Underwood; Jessic Dolliver; Raphael Ledermann; Philip S. Poole (2024). Additional file 10 of Stable, fluorescent markers for tracking synthetic communities and assembly dynamics [Dataset]. http://doi.org/10.6084/m9.figshare.26716909.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26716909.v1
Dataset updated
Aug 15, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Beatriz Jorrin; Timothy L. Haskett; Hayley E. Knights; Anna Martyn; Thomas J Underwood; Jessic Dolliver; Raphael Ledermann; Philip S. Poole
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 10: Table S5. Flow repository codes for flow cytometry data used in this study
Assessment of Equine Autoimmune Thrombocytopenia (EAT) by flow cytometry -...
healthdata.gov
csv, xlsx, xml
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Assessment of Equine Autoimmune Thrombocytopenia (EAT) by flow cytometry - 5km3-3x66 - Archive Repository [Dataset]. https://healthdata.gov/dataset/Assessment-of-Equine-Autoimmune-Thrombocytopenia-E/82fa-kdj3
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Sep 15, 2025
Description
This dataset tracks the updates made on the dataset "Assessment of Equine Autoimmune Thrombocytopenia (EAT) by flow cytometry" as a repository for previous versions of the data and metadata.
f
Repository for PPVI paper
figshare.com
application/x-gzip
Updated Aug 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tyler Barbero (2023). Repository for PPVI paper [Dataset]. http://doi.org/10.6084/m9.figshare.24061677.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24061677.v1
Dataset updated
Aug 30, 2023
Dataset provided by
figshare
Authors
Tyler Barbero
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The tar.gz file includes the original piecewise potential vorticity inversion code, plotting scripts, and post-processed data.
n
Flow virometry for water-quality assessment: Protocol optimization for a...
data-staging.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannah Safford; Heather Bischel (2023). Flow virometry for water-quality assessment: Protocol optimization for a model virus and automation of data analysis [Dataset]. http://doi.org/10.25338/B8PW6X
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25338/B8PW6X
Dataset updated
Jan 4, 2023
Dataset provided by
University of California, Davis
Authors
Hannah Safford; Heather Bischel
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Flow virometry (FVM) can support advanced water treatment and reuse by delivering near real-time information about viral water quality. But maximizing the potential of FVM in water treatment and reuse applications requires protocols to facilitate data validation and interlaboratory comparison—as well as approaches to protocol design to extend the suite of viruses that FVM can feasibly and efficiently monitor. In the npj Clean Water article “Flow virometry for water-quality assessment: Protocol optimization for a model virus and automation of data analysis,” we address these needs by first optimizing a sample-preparation protocol for a model virus (T4 bacteriophage) using a fractional factorial experimental design. We then compare manual and algorithmic methods of analyzing complex FCM data collected by applying the optimized protocol to (i) a clean solution spiked with a variety of biological and non-biological viral surrogates [mixed-target experiment], and (ii) tertiary treated wastewater effluent spiked with T4 bacteriophage and two sizes of fluorescent polystyrene beads [environmental spike experiment]. This repository contains the FCM data used to develop the optimized protocol and to test the two analytical methods. Methods All data were collected by analyzing a 10-mL volume of the sample in question using the 488 nm (blue) solid-state laser, the lowest possible instrument flowrate (5 mL/min), and a FITC = 800 threshold on a NovoCyte 2070V Flow Cytometer coupled with a NovoSampler Pro autosampler (Agilent). Green fluorescence (FITC) intensity was collected at 530 ± 30 nm; forward and side scatter (FSC and SSC) intensities were collected as well. For the optimization experiments, 10 mL of an unstained control was run after each sample. The instrument was flushed in between each sample and control by running 150 mL of 1x NovoClean solution (Agilent) followed by 150 mL of MQ water through the SIP at the highest instrument flow rate (120 mL/min). Instrument performance was ensured by performing the instrument’s built-in quality control (QC) test at least monthly. The FCM data were exported directly to .fcs (the standard format for flow cytometry/virometry data) files. All of the raw .fcs files used for the optimization experiments, mixed-target experiments, and environmental spike experiments are provided in this repository. For the mixed-target and environmental spike experiments, these .fcs files were then manually gated and exported to .csv files for use in downstream, algorithmically assisted analysis. Each of these .csv files is provided in this repository as well.
Datasets for manuscript "A data engineering framework for chemical flow...
catalog.data.gov
gimi9.com
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
d
The ORFIUS complex regulates ORC2 localization at replication origins (Flow...
dataone.org
datadryad.org
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Hill (2025). The ORFIUS complex regulates ORC2 localization at replication origins (Flow cytometry) [Dataset]. http://doi.org/10.5061/dryad.sn02v6x89
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.sn02v6x89
Dataset updated
Jul 26, 2025
Dataset provided by
Dryad Digital Repository
Authors
Sarah Hill
Time period covered
Jan 1, 2023
Description
In this work, we use bromodeoxyuridine (BrdU)/propidium iodide cell cycle flow cytometry profiling to demonstrate that treatment of ovarian cancer cell lines with gene specific siRNAs, shRNAs, or drugs leads to minimal or no cell cycle arrest which might otherwise influence observed phenotypes. All raw FCS files for flow cytometry data presented in Figures 4E, 6G, S3A, S4C, S5B, S5F, S8C, S8D, S8E, S9C, S10C, and S12B are available here., The dataset was collected by treating different cell lines with either siRNAs or shRNAs targeting specific genes or drugs which perturb replication origins, treating with BrdU, and staining the cells with anti-BrdU antibodies and propidium iodide. The cells were then analyzed by flow cytometry. We are including here for each experiment FCS files for each individual sample run.Â siRNA transfection:Â 100,000 cells were plated in either a well of a 6-well plate or a 6 cm dish on day zero. On days one and two, cells were transfected with siRNAs by mixing 10 pmol of the appropriate siRNA with Lipofectamine RNAi/Max (LifeTech Cat. # 13778150) in Opti-MEM media (Gibco Cat. # 31985070) and adding the mixture directly to the cells. Transfected cells were utilized for various assays on day three or later. Inducible hairpin experiments: OVCAR8 cells stably expressing control or BRD1 or HBO1 specific inducible hairpins along with either empty vector or shRNA-resistant BRD1 or HBO1 were assessed. Cel..., Any flow cytometry analysis software will open these FCS files. This would include FACS Diva or FlowJo., # Flow cytometry data for Yang and Hill ORFIUS complex manuscript

In this work, we use bromodeoxyuridine (BrdU)/propidium iodide cell cycle flow cytometry profiling to demonstrate that treatment of ovarian cancer cell lines with gene specific siRNAs, shRNAs or drugs leads to minimal or no cell cycle arrest which might otherwise influence observed phenotypes. All raw FCS files for flow cytometry data presented in Figures 4E, 6G, S3A, S4C, S5B, S5F, S8C, S8D, S8E, S9C, S10C, and S12B are available here.

Description of the data and file structure

The dataset was collected by treating different cell lines with either siRNAs or shRNAs targeting specific genes or drugs which perturb replication origins, treating with BrdU, and staining the cells with anti-BrdU antibodies and propidium iodide. The cells were then analyzed by flow cytometry. We are including here for each experiment FCS files for each individual sample run. The data is organized by Figure with one folder each for Fig...
s
Dataset for "Wind speed inference from environmental flow-structure...
purl.stanford.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cardona, Jennifer L; Bouman, Katherine L; Dabiri, John O, Dataset for "Wind speed inference from environmental flow-structure interactions", Flow, 2021. [Dataset]. https://purl.stanford.edu/tp480sx4819
Explore at:
Authors
Cardona, Jennifer L; Bouman, Katherine L; Dabiri, John O
Description
This repository contains video data of flexible cantilevered cylinders and trees in wind tunnel experiments. These data were used to produce the results shown in Cardona and Dabiri 2020, "Wind speed inference from environmental flow-structure interactions". Cardona JL, Bouman KL, and Dabiri JO (2021). "Wind speed inference from environmental flow–structure interactions," Flow. https://doi.org/10.1017/flo.2021.3
Datasets for manuscript "Integrating data engineering and process systems...
catalog.data.gov
gimi9.com
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Datasets for manuscript "Integrating data engineering and process systems engineering for end-of-life chemical flow analysis" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-integrating-data-engineering-and-process-systems-engineering-for-e
Explore at:
Dataset updated
Oct 10, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The Github Repository, https://github.com/jodhernandezbe/TRI4PLADS/tree/v1.0.0,, is publicly available and referenced in supplementary information. This GitHub repository describes the computational framework overview, software requirements, model use, model output, and disclaimer. This repository presents a multi-scale framework that combines data engineering with process systems engineering (PSE) to enhance the precision of chemical flow analysis (CFA) at the end-of-life (EoL) stage. The focus is on chemicals used in plastic manufacturing, tracing their flows through the supply chain and EoL pathways. Additionally, this study examines potential discharges from material recovery facilities to publicly owned treatment works (POTW) facilities, recognizing their relevance to human and environmental health. Tracking these discharges is critical, as industrial EoL material transfers to POTWs can interfere with biological treatment processes, leading to unintended environmental chemical releases. By integrating data-driven methodologies with mechanistic modeling, this framework supports the identification, quantification, and regulatory assessment of chemical discharges, providing a science-based foundation for industrial and policy decision-making in sustainable material and water management. The attached file CoU - Metadata File.xlsx contains the datasets to build Figure 3 and describe a qualitative flow diagram of methyl methacrylate from manufacturing to potential consumer products generated from the Chemical Conditions of Use Locator methodology (https://doi.org/10.1111/jiec.13626). The attached file "MMA POTW Dataset.xlsx" contains the datasets needed to run the Chemical Tracker and Exposure Assessor in Publicly Owned Treatment Works Model (ChemTEAPOTW) as described in the Github Repository https://github.com/gruizmer/ChemTEAPOTW. The attached file "Plastic Data-Calculations-Assumptions.docx" contains all calculations and assumption to estimate the methyl methacrylate (MMA) releases from plastic recycling. Finally, users can generate Figures 4 and 5 after following the step-by-step process described in main Github repository for the MMA case study. This dataset is associated with the following publication: Hernandez-Betancur, J.D., J.D. Chea, D. Perez, and G.J. Ruiz-Mercado. Integrating data engineering and process systems engineering for end-of-life chemical flow analysis. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 204: 109414, (2026).
Z
Supplementary data for the manuscript "Flow and Entrainment Mechanisms...
data.niaid.nih.gov
Updated Aug 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Constantinescu; Hao Wu; Jie Zeng (2020). Supplementary data for the manuscript "Flow and Entrainment Mechanisms around a Freshwater Mussel Aligned with the Incoming Flow" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4000304
Explore at:
Dataset updated
Aug 26, 2020
Dataset provided by
South Florida Water Management District
The University of Iowa
Authors
George Constantinescu; Hao Wu; Jie Zeng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository is associated with the manuscript for Water Resource Research (WRR): Flow and Entrainment Mechanisms around a Freshwater Mussel Aligned with the Incoming Flow. The repository contains the data files for the manuscript.
A model for detecting the effects of vibration on peripheral blood flow -...
healthdata.gov
csv, xlsx, xml
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). A model for detecting the effects of vibration on peripheral blood flow - t3fb-9dnk - Archive Repository [Dataset]. https://healthdata.gov/dataset/A-model-for-detecting-the-effects-of-vibration-on-/isf8-d8pt
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Jul 16, 2025
Description
This dataset tracks the updates made on the dataset "A model for detecting the effects of vibration on peripheral blood flow" as a repository for previous versions of the data and metadata.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). FlowRepository [Dataset]. http://identifiers.org/re3data:r3d100011280

FlowRepository

Explore at:

Unique identifier

https://identifiers.org/re3data:r3d100011280

Dataset updated

Apr 30, 2021

Description

FlowRepository is a database of flow cytometry experiments where you can query and download data collected and annotated according to the MIFlowCyt standard. It is primarily used as a data deposition place for experimental findings published in peer-reviewed journals in the flow cytometry field.

Clear search

Close search

Google apps

Main menu

FlowRepository

FLOWRepository

Downsampled data from FlowRepository: FR-FCM-Z3WR

Flow Cytometry Bioinformatics

Single-cell datasets for distribution-based sketching

Data from: Stochastic Regression and Peak Delineation with Flow Cytometry...

Cross-platform cytometry benchmark data

Cohort overview

CORE:

EXTENDED:

Processing overview

Patient information

Gatings

Supplementary Information Files for Current trends in flow cytometry...

fsc_files_Fc7_Salmon

Supplementary information files for Assessment of Automated Flow Cytometry...

Additional file 10 of Stable, fluorescent markers for tracking synthetic...

Assessment of Equine Autoimmune Thrombocytopenia (EAT) by flow cytometry -...

Repository for PPVI paper

Flow virometry for water-quality assessment: Protocol optimization for a...

Datasets for manuscript "A data engineering framework for chemical flow...

The ORFIUS complex regulates ORC2 localization at replication origins (Flow...

Description of the data and file structure

Dataset for "Wind speed inference from environmental flow-structure...

Datasets for manuscript "Integrating data engineering and process systems...

Supplementary data for the manuscript "Flow and Entrainment Mechanisms...

A model for detecting the effects of vibration on peripheral blood flow -...

FlowRepository