6 datasets found
  1. Expression of 97 surface markers and RNA (transcriptome wide) in 13165 cells...

    • figshare.com
    application/gzip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Velten; Sergio Triana; Simon Haas; Lea Jopp-Saile; Dominik Vonficht; Malte Paulsen (2023). Expression of 97 surface markers and RNA (transcriptome wide) in 13165 cells from a healthy young bone marrow donor [Dataset]. http://doi.org/10.6084/m9.figshare.13397987.v4
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lars Velten; Sergio Triana; Simon Haas; Lea Jopp-Saile; Dominik Vonficht; Malte Paulsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Seurat v3 object

    ASSAYS: AB: Antibody expression data RNA: mRNA expression data BOTH: Concatenated mRNA and antbody expression matrices

    DIMENSIONALITY REDUCTION MOFA: Multi-OMICS factor analysis to integrate AB and RNA data. MOFA served as input for clustering and further dimensionality reduction. MOFAUMAP: UMAP performed on MOFA dimensions. Display used in the manuscript.

    MOFATSNE: UMAP performed on MOFA dimensions. Projected: Data was projected on the reference dataset MOFAUMAP coordinates

    METADATA ct: Projected cell type (cell type labels from the reference dataset are used). Idents(object) uses an unsupervised clustering performed on this dataset.

    For the reference dataset, see https://doi.org/10.6084/m9.figshare.13397651.v2

    Changelog v3: Compared to the previous version of the file, projected UMAP coordinates and projected cell type labels were added. Also, neighborhood graphs and normalized data are now contained in the object. v4: Objects were slimed to correspond to the information described in our study. Data now only contains relevant dimensions reductions and metadata columns; unused RNA and antibody targets were excluded from the objects.

  2. CellTracksColab - breast cancer cell dataset

    • zenodo.org
    zip
    Updated May 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Jacquemet; Guillaume Jacquemet; Estibaliz Gómez-de-Mariscal; Estibaliz Gómez-de-Mariscal; Hanna Grobe; Hanna Grobe; Joanna Pylvänäinen; Joanna Pylvänäinen; Laura Xénard; Laura Xénard; Ricardo Henriques; Ricardo Henriques; 0000-0002-0998-4718 Tinevez; 0000-0002-0998-4718 Tinevez (2024). CellTracksColab - breast cancer cell dataset [Dataset]. http://doi.org/10.5281/zenodo.11282716
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guillaume Jacquemet; Guillaume Jacquemet; Estibaliz Gómez-de-Mariscal; Estibaliz Gómez-de-Mariscal; Hanna Grobe; Hanna Grobe; Joanna Pylvänäinen; Joanna Pylvänäinen; Laura Xénard; Laura Xénard; Ricardo Henriques; Ricardo Henriques; 0000-0002-0998-4718 Tinevez; 0000-0002-0998-4718 Tinevez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used in the manuscript "CellTracksColab—A platform for compiling, analyzing, and exploring tracking data"

    This Zenodo archive contains:

    • The raw video (Raw zip files)
    • The tracking files as XML and CSV files (Tracks.zip)
    • The masks used to identify the edges of the monolayer (Monolayer_edges.zip)
    • The CellTracksColab dataframes storing the dataset (CellTracksColab_results.zip)
    • The CellTracksColab outputs used to make the figures in the paper (CellTracksColab_results.zip)

    In brief:

    In this experiment, approximately 50,000 shCTRL or shMYO10 lifeact-RFP DCIS.COM cells were seeded into one well of an ibidi culture-insert 2 well pre-placed in a µ-Slide 8 well. The cells were cultured for 24 hours, after which the culture insert was removed to create a wound-healing assay setup. When appropriate, a fibrillar collagen gel (PureCol EZ Gel) was applied over the cells and allowed to polymerize for 30 minutes at 37°C. Standard culture media was added to all wells, and the cells were left to migrate/invade for two days. Before live cell imaging, the cells were treated with 0.5 µM SiR-DNA (SiR-Hoechst, Tetu-bio) for two hours. Imaging was performed over 14 hours using a Marianas spinning-disk confocal microscope system. This system included a Yokogawa CSU-W1 scanning unit mounted on an inverted Zeiss Axio Observer Z1 microscope (Intelligent Imaging Innovations, Inc.). Imaging was conducted using a 20x (NA 0.8) air Plan Apochromat objective (Zeiss), and images were captured at 10-minute intervals.

    Cell tracking was conducted using Fiji and TrackMate. The Stardist detector was employed to detect nuclei using the Stardist versatile model. Tracks were created using the Kalman tracker (a maximum frame gap of 1, a Kalman search radius of 20 µm, and a linking maximum distance of 15 µm). Post-tracking, tracks were filtered so that each track had to contain more than six spots, ensuring a significant amount of data per track, and the total distance traveled by cells had to be greater than 89 µm.

    In CellTracksColab, we conducted a dimensionality reduction analysis employing Uniform Manifold Approximation and Projection (UMAP). The UMAP settings were as follows: number of neighbors (n_neighbors) set to 10, minimum distance (min_dist) to 0.5, and number of dimensions (n_dimension) to 2. This analysis utilized an array of track metrics, including:

    NUMBER_SPOTS, NUMBER_GAPS, NUMBER_SPLITS, NUMBER_MERGES, NUMBER_COMPLEX, LONGEST_GAP, TRACK_DISPLACEMENT, TRACK_MEAN_QUALITY, MAX_DISTANCE_TRAVELED, CONFINEMENT_RATIO, MEAN_STRAIGHT_LINE_SPEED, LINEARITY_OF_FORWARD_PROGRESSION, MEAN_DIRECTIONAL_CHANGE_RATE, Track Duration, Mean Speed, Median Speed, Max Speed, Min Speed, Speed Standard Deviation, Total Distance Traveled, Directionality, Tortuosity, Total Turning Angle, Spatial Coverage, MEAN_MEAN_INTENSITY_CH1, MEAN_MEDIAN_INTENSITY_CH1, MEAN_MIN_INTENSITY_CH1, MEAN_MAX_INTENSITY_CH1, MEAN_TOTAL_INTENSITY_CH1, MEAN_STD_INTENSITY_CH1, MEAN_CONTRAST_CH1, MEAN_SNR_CH1, MEAN_ELLIPSE_X0, MEAN_ELLIPSE_Y0, MEAN_ELLIPSE_MAJOR, MEAN_ELLIPSE_MINOR, MEAN_ELLIPSE_THETA, MEAN_ELLIPSE_ASPECTRATIO, MEAN_AREA, MEAN_PERIMETER, MEAN_CIRCULARITY, MEAN_SOLIDITY, MEAN_SHAPE_INDEX, MEDIAN_MEAN_INTENSITY_CH1, MEDIAN_MEDIAN_INTENSITY_CH1, MEDIAN_MIN_INTENSITY_CH1, MEDIAN_MAX_INTENSITY_CH1, MEDIAN_TOTAL_INTENSITY_CH1, MEDIAN_STD_INTENSITY_CH1, MEDIAN_CONTRAST_CH1, MEDIAN_SNR_CH1, MEDIAN_ELLIPSE_X0, MEDIAN_ELLIPSE_Y0, MEDIAN_ELLIPSE_MAJOR, MEDIAN_ELLIPSE_MINOR, MEDIAN_ELLIPSE_THETA, MEDIAN_ELLIPSE_ASPECTRATIO, MEDIAN_AREA, MEDIAN_PERIMETER, MEDIAN_CIRCULARITY, MEDIAN_SOLIDITY, MEDIAN_SHAPE_INDEX, STD_MEAN_INTENSITY_CH1, STD_MEDIAN_INTENSITY_CH1, STD_MIN_INTENSITY_CH1, STD_MAX_INTENSITY_CH1, STD_TOTAL_INTENSITY_CH1, STD_STD_INTENSITY_CH1, STD_CONTRAST_CH1, STD_SNR_CH1, STD_ELLIPSE_X0, STD_ELLIPSE_Y0, STD_ELLIPSE_MAJOR, STD_ELLIPSE_MINOR, STD_ELLIPSE_THETA, STD_ELLIPSE_ASPECTRATIO, STD_AREA, STD_PERIMETER, STD_CIRCULARITY, STD_SOLIDITY, STD_SHAPE_INDEX, MIN_MEAN_INTENSITY_CH1, MIN_MEDIAN_INTENSITY_CH1, MIN_MIN_INTENSITY_CH1, MIN_MAX_INTENSITY_CH1, MIN_TOTAL_INTENSITY_CH1, MIN_STD_INTENSITY_CH1, MIN_CONTRAST_CH1, MIN_SNR_CH1, MIN_ELLIPSE_X0, MIN_ELLIPSE_Y0, MIN_ELLIPSE_MAJOR, MIN_ELLIPSE_MINOR, MIN_ELLIPSE_THETA, MIN_ELLIPSE_ASPECTRATIO, MIN_AREA, MIN_PERIMETER, MIN_CIRCULARITY, MIN_SOLIDITY, MIN_SHAPE_INDEX, MAX_MEAN_INTENSITY_CH1, MAX_MEDIAN_INTENSITY_CH1, MAX_MIN_INTENSITY_CH1, MAX_MAX_INTENSITY_CH1, MAX_TOTAL_INTENSITY_CH1, MAX_STD_INTENSITY_CH1, MAX_CONTRAST_CH1, MAX_SNR_CH1, MAX_ELLIPSE_X0, MAX_ELLIPSE_Y0, MAX_ELLIPSE_MAJOR, MAX_ELLIPSE_MINOR, MAX_ELLIPSE_THETA, MAX_ELLIPSE_ASPECTRATIO, MAX_AREA, MAX_PERIMETER, MAX_CIRCULARITY, MAX_SOLIDITY, MAX_SHAPE_INDEX, MaxDistance_edge, MinDistance_edge, StartDistance_edge, EndDistance_edge, MedianDistance_edge, StdDevDistance_edge, DirectionMovement_edge, AvgRateChange_edge, PercentageChange_edge, TrendSlope_edge

    Subsequently, clustering analysis was performed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). The parameters included clustering_data_source set to UMAP, min_samples at 20, min_cluster_size at 200, and the metric employed was Canberra.

  3. D

    Data from: Data related to Panzer: A Machine Learning Based Approach to...

    • darus.uni-stuttgart.de
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Panzer (2024). Data related to Panzer: A Machine Learning Based Approach to Analyze Supersecondary Structures of Proteins [Dataset]. http://doi.org/10.18419/DARUS-4576
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    DaRUS
    Authors
    Tim Panzer
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576

    Time period covered
    Nov 1, 1976 - Feb 29, 2024
    Dataset funded by
    DFG
    Description

    This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7

  4. f

    Expression of 197 surface markers and 462 mRNAs in 15281 cells from blood...

    • figshare.com
    application/gzip
    Updated Feb 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Velten; Sergio Triana; Simon Haas; Dominik Vonficht; Lea Jopp-Saile; Malte Paulsen (2024). Expression of 197 surface markers and 462 mRNAs in 15281 cells from blood and bone marrow from a young healthy donor [Dataset]. http://doi.org/10.6084/m9.figshare.13398065.v4
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 23, 2024
    Dataset provided by
    figshare
    Authors
    Lars Velten; Sergio Triana; Simon Haas; Dominik Vonficht; Lea Jopp-Saile; Malte Paulsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    seurat v3 objectASSAYS:AB: Antibody expression dataRNA: mRNA expression dataBOTH: Antibody and mRNA expression matrices concatenatedDIMENSIONALITY REDUCTIONProjected: Data was projected on the reference dataset MOFAUMAP coordinates.METADATABatch: Sample1 (Bone marrow) or Sample2 (Blood)ct: Projected cell type (cell type labels from the reference dataset are used).Idents(object) uses an unsupervised clustering performed on this dataset.For the reference dataset, see https://doi.org/10.6084/m9.figshare.13397651.v2Changelogv2: Compared to the previous version of the file, projected UMAP coordinates and projected cell type labels were added.v3: Objects were slimmed to correspond to the information described in our study. Data now only contains relevant dimensions reductions and metadata columns; unused RNA and antibody targets were excluded from the objects.v4: Added Batch information back in which was dropped from v2 to v3

  5. f

    Material for manifold learning techniques comparison on benchmark dataset

    • springernature.figshare.com
    application/x-gzip
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elodie Laine; Valentin Lombard; Sergei Grudinin (2024). Material for manifold learning techniques comparison on benchmark dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25112459.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    figshare
    Authors
    Elodie Laine; Valentin Lombard; Sergei Grudinin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This archive contains the restricted 10 ensemble benchmark and the scripts used in the manifold learning techniques assessment. Files related to an ensemble are prefixed with ID1_ID2_, where ID1 is the first member in alphabetical order, and ID2 is the reference for the structural alignment.

    The archive includes the following for each member of the benchmark: A _mm.pdb file containing the ensemble's conformations. A _aln.fa file, which is the multiple sequence alignment of the ensemble. A _rmsd.txt file with the all pairwise root mean squared deviation (RMSD) of the ensemble. A _raw_coords_ca.bin file with the raw coordinates in binary format. A _raw_coords_ca_mask.bin file with the binary format gap coordinates. A _features_pca.csv file detailing the positions of each sample in the ensemble's principal component space. A _dist_to_hull.csv file with the ID of each ensemble member, their label in the clustering in the PC space, and the squared distance of this sample to the convex hull formed by members of the other clusters. A _pca_errors.csv file containing the same information as the _dist_to_hull.csv file, but with the addition of the PCA reconstruction error, measured as the RMSD between the predicted and ground truth structures. The prediction of a sample is done by fitting the PCA to all clusters except the one being evaluated. Three _XXX_kcpa_errors.json files with the kPCA reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using kPCA at different sigma and alpha parameters from the grid search. The XXX indicates the kernel used. The prediction of a sample is done by fitting the kPCA to all clusters except the one being evaluated. A _umap_errors.json file with the UMAP reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using UMAP at different n_neigh and min_dist parameters from the grid search. The prediction of a sample is done by fitting the UMAP to all clusters except the one being evaluated. UMAP could be run only on a subset of the ensembles. A _rbf_kpca_default_sigma.json file containing the kPCA reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using kPCA with RBF kernel at the default alpha and sigma parameters. The prediction of a sample is done by fitting the kPCA to all clusters except the one being evaluated. A _rbf_kpca_errors_real.json file with the kPCA reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using kPCA with RBF kernel with a predicted optimal sigma parameter and alpha parameters of 1.0, 1e-5, and 1e-6. The prediction of a sample is done by fitting the kPCA to all clusters except the one being evaluated. The scripts used to generate the convex hull and for the PCA-kPCA comparison are as follows: dist_to_hull.py computes the coordinates in the PC space of each member, divides the members into clusters, and computes the distance of each member to the convex hull formed by members of the other clusters in the PC space. This script uses polytope_app.cpp with a Python binding to compute the squared distance of each member to the convex hull. polytope_module.so is the compiled C++ module called by the Python script. interpol_apase.py computes the interpolation in the ATPase latent space, and outputs the .pdb files of the trajectories. pca_kpca.py calculates the reconstruction error for both PCA, kPCA, and UMAP for each ensemble member by fitting the PCA, kPCA, or UMAP to all members of other clusters, excluding the cluster of the member currently being evaluated. A procheck folder containing summary tables of the procheck analysis on original and reconstructed structures. The stats.csv file contains descriptive information about the benchmark. Please consult the related documentation to understand the meaning of each column in this file.

  6. EPI-Clone dataset X.1 : Targeted DNAm+DNA+RNA-seq from CD34+ BM cells of a...

    • figshare.com
    application/gzip
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Velten (2025). EPI-Clone dataset X.1 : Targeted DNAm+DNA+RNA-seq from CD34+ BM cells of a healthy donor [Dataset]. http://doi.org/10.6084/m9.figshare.27991574.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lars Velten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is dataset supports parts of the EPI-Clone manuscript. Here, targeted single cell methylation profiling (scTAMseq) was combined with targeted RNA-seq from the same cells (SDR-seq) to profile CD34+ cells from bone marrow of a healthy 51-year old male individual.Dataset is a seurat (v5) object with the following assays, reductions and metadata:ASSAYS:RNA: RNA expression data for 120 target genesDNAm: DNA methylation data, containing binary observations (0: amplicon not observed, i.e. dropout or absence of DNA methylation, 1: amplicon observed, i.e. DNA methylation). See the paper on scTAMseqDIMENSIONALITY REDUCTIONpca, dynapca: PCA performed on all methylome data, or on consensus dynamic CpGs onlyumap, dynaumap: UMAP computed on all methylome data, or on consensus dynamic CpGs onlyprojected: Methylome data projected on the reference CD34+ UMAP coordinate (add DOI!)rnapca: PCA performed on RNA datarnaumap: PCA performed on RNA dataFor strategies how to obtain dimensionality reduction that reflect clonal identity, please see the github page accompanying the manuscript.METADATAnFeature_RNA, nFeature_DNAm, nFeature_NonHhaI: Number of RNA , DNAm and genotypoing amplicons observedprojected.cluster: Cell type, according to DNA methylation based projection on the CD34+ referenceCellType_rna: Cell type annotation according to RNA expressionCountsChrY: Y chromosome read countsEPIclone_id: Clone, according to EPI-clone algorithm

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lars Velten; Sergio Triana; Simon Haas; Lea Jopp-Saile; Dominik Vonficht; Malte Paulsen (2023). Expression of 97 surface markers and RNA (transcriptome wide) in 13165 cells from a healthy young bone marrow donor [Dataset]. http://doi.org/10.6084/m9.figshare.13397987.v4
Organization logo

Expression of 97 surface markers and RNA (transcriptome wide) in 13165 cells from a healthy young bone marrow donor

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
application/gzipAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Lars Velten; Sergio Triana; Simon Haas; Lea Jopp-Saile; Dominik Vonficht; Malte Paulsen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Seurat v3 object

ASSAYS: AB: Antibody expression data RNA: mRNA expression data BOTH: Concatenated mRNA and antbody expression matrices

DIMENSIONALITY REDUCTION MOFA: Multi-OMICS factor analysis to integrate AB and RNA data. MOFA served as input for clustering and further dimensionality reduction. MOFAUMAP: UMAP performed on MOFA dimensions. Display used in the manuscript.

MOFATSNE: UMAP performed on MOFA dimensions. Projected: Data was projected on the reference dataset MOFAUMAP coordinates

METADATA ct: Projected cell type (cell type labels from the reference dataset are used). Idents(object) uses an unsupervised clustering performed on this dataset.

For the reference dataset, see https://doi.org/10.6084/m9.figshare.13397651.v2

Changelog v3: Compared to the previous version of the file, projected UMAP coordinates and projected cell type labels were added. Also, neighborhood graphs and normalized data are now contained in the object. v4: Objects were slimed to correspond to the information described in our study. Data now only contains relevant dimensions reductions and metadata columns; unused RNA and antibody targets were excluded from the objects.

Search
Clear search
Close search
Google apps
Main menu