Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Seurat v3 object
ASSAYS: AB: Antibody expression data RNA: mRNA expression data BOTH: Concatenated mRNA and antbody expression matrices
DIMENSIONALITY REDUCTION MOFA: Multi-OMICS factor analysis to integrate AB and RNA data. MOFA served as input for clustering and further dimensionality reduction. MOFAUMAP: UMAP performed on MOFA dimensions. Display used in the manuscript.
MOFATSNE: UMAP performed on MOFA dimensions. Projected: Data was projected on the reference dataset MOFAUMAP coordinates
METADATA ct: Projected cell type (cell type labels from the reference dataset are used). Idents(object) uses an unsupervised clustering performed on this dataset.
For the reference dataset, see https://doi.org/10.6084/m9.figshare.13397651.v2
Changelog v3: Compared to the previous version of the file, projected UMAP coordinates and projected cell type labels were added. Also, neighborhood graphs and normalized data are now contained in the object. v4: Objects were slimed to correspond to the information described in our study. Data now only contains relevant dimensions reductions and metadata columns; unused RNA and antibody targets were excluded from the objects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used in the manuscript "CellTracksColab—A platform for compiling, analyzing, and exploring tracking data"
This Zenodo archive contains:
In brief:
In this experiment, approximately 50,000 shCTRL or shMYO10 lifeact-RFP DCIS.COM cells were seeded into one well of an ibidi culture-insert 2 well pre-placed in a µ-Slide 8 well. The cells were cultured for 24 hours, after which the culture insert was removed to create a wound-healing assay setup. When appropriate, a fibrillar collagen gel (PureCol EZ Gel) was applied over the cells and allowed to polymerize for 30 minutes at 37°C. Standard culture media was added to all wells, and the cells were left to migrate/invade for two days. Before live cell imaging, the cells were treated with 0.5 µM SiR-DNA (SiR-Hoechst, Tetu-bio) for two hours. Imaging was performed over 14 hours using a Marianas spinning-disk confocal microscope system. This system included a Yokogawa CSU-W1 scanning unit mounted on an inverted Zeiss Axio Observer Z1 microscope (Intelligent Imaging Innovations, Inc.). Imaging was conducted using a 20x (NA 0.8) air Plan Apochromat objective (Zeiss), and images were captured at 10-minute intervals.
Cell tracking was conducted using Fiji and TrackMate. The Stardist detector was employed to detect nuclei using the Stardist versatile model. Tracks were created using the Kalman tracker (a maximum frame gap of 1, a Kalman search radius of 20 µm, and a linking maximum distance of 15 µm). Post-tracking, tracks were filtered so that each track had to contain more than six spots, ensuring a significant amount of data per track, and the total distance traveled by cells had to be greater than 89 µm.
In CellTracksColab, we conducted a dimensionality reduction analysis employing Uniform Manifold Approximation and Projection (UMAP). The UMAP settings were as follows: number of neighbors (n_neighbors) set to 10, minimum distance (min_dist) to 0.5, and number of dimensions (n_dimension) to 2. This analysis utilized an array of track metrics, including:
NUMBER_SPOTS, NUMBER_GAPS, NUMBER_SPLITS, NUMBER_MERGES, NUMBER_COMPLEX, LONGEST_GAP, TRACK_DISPLACEMENT, TRACK_MEAN_QUALITY, MAX_DISTANCE_TRAVELED, CONFINEMENT_RATIO, MEAN_STRAIGHT_LINE_SPEED, LINEARITY_OF_FORWARD_PROGRESSION, MEAN_DIRECTIONAL_CHANGE_RATE, Track Duration, Mean Speed, Median Speed, Max Speed, Min Speed, Speed Standard Deviation, Total Distance Traveled, Directionality, Tortuosity, Total Turning Angle, Spatial Coverage, MEAN_MEAN_INTENSITY_CH1, MEAN_MEDIAN_INTENSITY_CH1, MEAN_MIN_INTENSITY_CH1, MEAN_MAX_INTENSITY_CH1, MEAN_TOTAL_INTENSITY_CH1, MEAN_STD_INTENSITY_CH1, MEAN_CONTRAST_CH1, MEAN_SNR_CH1, MEAN_ELLIPSE_X0, MEAN_ELLIPSE_Y0, MEAN_ELLIPSE_MAJOR, MEAN_ELLIPSE_MINOR, MEAN_ELLIPSE_THETA, MEAN_ELLIPSE_ASPECTRATIO, MEAN_AREA, MEAN_PERIMETER, MEAN_CIRCULARITY, MEAN_SOLIDITY, MEAN_SHAPE_INDEX, MEDIAN_MEAN_INTENSITY_CH1, MEDIAN_MEDIAN_INTENSITY_CH1, MEDIAN_MIN_INTENSITY_CH1, MEDIAN_MAX_INTENSITY_CH1, MEDIAN_TOTAL_INTENSITY_CH1, MEDIAN_STD_INTENSITY_CH1, MEDIAN_CONTRAST_CH1, MEDIAN_SNR_CH1, MEDIAN_ELLIPSE_X0, MEDIAN_ELLIPSE_Y0, MEDIAN_ELLIPSE_MAJOR, MEDIAN_ELLIPSE_MINOR, MEDIAN_ELLIPSE_THETA, MEDIAN_ELLIPSE_ASPECTRATIO, MEDIAN_AREA, MEDIAN_PERIMETER, MEDIAN_CIRCULARITY, MEDIAN_SOLIDITY, MEDIAN_SHAPE_INDEX, STD_MEAN_INTENSITY_CH1, STD_MEDIAN_INTENSITY_CH1, STD_MIN_INTENSITY_CH1, STD_MAX_INTENSITY_CH1, STD_TOTAL_INTENSITY_CH1, STD_STD_INTENSITY_CH1, STD_CONTRAST_CH1, STD_SNR_CH1, STD_ELLIPSE_X0, STD_ELLIPSE_Y0, STD_ELLIPSE_MAJOR, STD_ELLIPSE_MINOR, STD_ELLIPSE_THETA, STD_ELLIPSE_ASPECTRATIO, STD_AREA, STD_PERIMETER, STD_CIRCULARITY, STD_SOLIDITY, STD_SHAPE_INDEX, MIN_MEAN_INTENSITY_CH1, MIN_MEDIAN_INTENSITY_CH1, MIN_MIN_INTENSITY_CH1, MIN_MAX_INTENSITY_CH1, MIN_TOTAL_INTENSITY_CH1, MIN_STD_INTENSITY_CH1, MIN_CONTRAST_CH1, MIN_SNR_CH1, MIN_ELLIPSE_X0, MIN_ELLIPSE_Y0, MIN_ELLIPSE_MAJOR, MIN_ELLIPSE_MINOR, MIN_ELLIPSE_THETA, MIN_ELLIPSE_ASPECTRATIO, MIN_AREA, MIN_PERIMETER, MIN_CIRCULARITY, MIN_SOLIDITY, MIN_SHAPE_INDEX, MAX_MEAN_INTENSITY_CH1, MAX_MEDIAN_INTENSITY_CH1, MAX_MIN_INTENSITY_CH1, MAX_MAX_INTENSITY_CH1, MAX_TOTAL_INTENSITY_CH1, MAX_STD_INTENSITY_CH1, MAX_CONTRAST_CH1, MAX_SNR_CH1, MAX_ELLIPSE_X0, MAX_ELLIPSE_Y0, MAX_ELLIPSE_MAJOR, MAX_ELLIPSE_MINOR, MAX_ELLIPSE_THETA, MAX_ELLIPSE_ASPECTRATIO, MAX_AREA, MAX_PERIMETER, MAX_CIRCULARITY, MAX_SOLIDITY, MAX_SHAPE_INDEX, MaxDistance_edge, MinDistance_edge, StartDistance_edge, EndDistance_edge, MedianDistance_edge, StdDevDistance_edge, DirectionMovement_edge, AvgRateChange_edge, PercentageChange_edge, TrendSlope_edge
Subsequently, clustering analysis was performed using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). The parameters included clustering_data_source set to UMAP, min_samples at 20, min_cluster_size at 200, and the metric employed was Canberra.
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576
This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
seurat v3 objectASSAYS:AB: Antibody expression dataRNA: mRNA expression dataBOTH: Antibody and mRNA expression matrices concatenatedDIMENSIONALITY REDUCTIONProjected: Data was projected on the reference dataset MOFAUMAP coordinates.METADATABatch: Sample1 (Bone marrow) or Sample2 (Blood)ct: Projected cell type (cell type labels from the reference dataset are used).Idents(object) uses an unsupervised clustering performed on this dataset.For the reference dataset, see https://doi.org/10.6084/m9.figshare.13397651.v2Changelogv2: Compared to the previous version of the file, projected UMAP coordinates and projected cell type labels were added.v3: Objects were slimmed to correspond to the information described in our study. Data now only contains relevant dimensions reductions and metadata columns; unused RNA and antibody targets were excluded from the objects.v4: Added Batch information back in which was dropped from v2 to v3
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This archive contains the restricted 10 ensemble benchmark and the scripts used in the manifold learning techniques assessment. Files related to an ensemble are prefixed with ID1_ID2_, where ID1 is the first member in alphabetical order, and ID2 is the reference for the structural alignment.
The archive includes the following for each member of the benchmark: A _mm.pdb file containing the ensemble's conformations. A _aln.fa file, which is the multiple sequence alignment of the ensemble. A _rmsd.txt file with the all pairwise root mean squared deviation (RMSD) of the ensemble. A _raw_coords_ca.bin file with the raw coordinates in binary format. A _raw_coords_ca_mask.bin file with the binary format gap coordinates. A _features_pca.csv file detailing the positions of each sample in the ensemble's principal component space. A _dist_to_hull.csv file with the ID of each ensemble member, their label in the clustering in the PC space, and the squared distance of this sample to the convex hull formed by members of the other clusters. A _pca_errors.csv file containing the same information as the _dist_to_hull.csv file, but with the addition of the PCA reconstruction error, measured as the RMSD between the predicted and ground truth structures. The prediction of a sample is done by fitting the PCA to all clusters except the one being evaluated. Three _XXX_kcpa_errors.json files with the kPCA reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using kPCA at different sigma and alpha parameters from the grid search. The XXX indicates the kernel used. The prediction of a sample is done by fitting the kPCA to all clusters except the one being evaluated. A _umap_errors.json file with the UMAP reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using UMAP at different n_neigh and min_dist parameters from the grid search. The prediction of a sample is done by fitting the UMAP to all clusters except the one being evaluated. UMAP could be run only on a subset of the ensembles. A _rbf_kpca_default_sigma.json file containing the kPCA reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using kPCA with RBF kernel at the default alpha and sigma parameters. The prediction of a sample is done by fitting the kPCA to all clusters except the one being evaluated. A _rbf_kpca_errors_real.json file with the kPCA reconstruction errors for each ensemble member, measured as the RMSD between the predicted and ground truth structures, using kPCA with RBF kernel with a predicted optimal sigma parameter and alpha parameters of 1.0, 1e-5, and 1e-6. The prediction of a sample is done by fitting the kPCA to all clusters except the one being evaluated. The scripts used to generate the convex hull and for the PCA-kPCA comparison are as follows: dist_to_hull.py computes the coordinates in the PC space of each member, divides the members into clusters, and computes the distance of each member to the convex hull formed by members of the other clusters in the PC space. This script uses polytope_app.cpp with a Python binding to compute the squared distance of each member to the convex hull. polytope_module.so is the compiled C++ module called by the Python script. interpol_apase.py computes the interpolation in the ATPase latent space, and outputs the .pdb files of the trajectories. pca_kpca.py calculates the reconstruction error for both PCA, kPCA, and UMAP for each ensemble member by fitting the PCA, kPCA, or UMAP to all members of other clusters, excluding the cluster of the member currently being evaluated. A procheck folder containing summary tables of the procheck analysis on original and reconstructed structures. The stats.csv file contains descriptive information about the benchmark. Please consult the related documentation to understand the meaning of each column in this file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is dataset supports parts of the EPI-Clone manuscript. Here, targeted single cell methylation profiling (scTAMseq) was combined with targeted RNA-seq from the same cells (SDR-seq) to profile CD34+ cells from bone marrow of a healthy 51-year old male individual.Dataset is a seurat (v5) object with the following assays, reductions and metadata:ASSAYS:RNA: RNA expression data for 120 target genesDNAm: DNA methylation data, containing binary observations (0: amplicon not observed, i.e. dropout or absence of DNA methylation, 1: amplicon observed, i.e. DNA methylation). See the paper on scTAMseqDIMENSIONALITY REDUCTIONpca, dynapca: PCA performed on all methylome data, or on consensus dynamic CpGs onlyumap, dynaumap: UMAP computed on all methylome data, or on consensus dynamic CpGs onlyprojected: Methylome data projected on the reference CD34+ UMAP coordinate (add DOI!)rnapca: PCA performed on RNA datarnaumap: PCA performed on RNA dataFor strategies how to obtain dimensionality reduction that reflect clonal identity, please see the github page accompanying the manuscript.METADATAnFeature_RNA, nFeature_DNAm, nFeature_NonHhaI: Number of RNA , DNAm and genotypoing amplicons observedprojected.cluster: Cell type, according to DNA methylation based projection on the CD34+ referenceCellType_rna: Cell type annotation according to RNA expressionCountsChrY: Y chromosome read countsEPIclone_id: Clone, according to EPI-clone algorithm
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Seurat v3 object
ASSAYS: AB: Antibody expression data RNA: mRNA expression data BOTH: Concatenated mRNA and antbody expression matrices
DIMENSIONALITY REDUCTION MOFA: Multi-OMICS factor analysis to integrate AB and RNA data. MOFA served as input for clustering and further dimensionality reduction. MOFAUMAP: UMAP performed on MOFA dimensions. Display used in the manuscript.
MOFATSNE: UMAP performed on MOFA dimensions. Projected: Data was projected on the reference dataset MOFAUMAP coordinates
METADATA ct: Projected cell type (cell type labels from the reference dataset are used). Idents(object) uses an unsupervised clustering performed on this dataset.
For the reference dataset, see https://doi.org/10.6084/m9.figshare.13397651.v2
Changelog v3: Compared to the previous version of the file, projected UMAP coordinates and projected cell type labels were added. Also, neighborhood graphs and normalized data are now contained in the object. v4: Objects were slimed to correspond to the information described in our study. Data now only contains relevant dimensions reductions and metadata columns; unused RNA and antibody targets were excluded from the objects.