11 datasets found

f
Data_Sheet_1_Manifold learning for fMRI time-varying functional...
frontiersin.figshare.com
docx
Updated Jul 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini (2023). Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx [Dataset]. http://doi.org/10.3389/fnhum.2023.1134012.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnhum.2023.1134012.s001
Dataset updated
Jul 11, 2023
Dataset provided by
Frontiers
Authors
Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)—namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies—are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.
f
Data_Sheet_1_Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE)...
figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vojtěch Spiwok; Pavel Kříž (2023). Data_Sheet_1_Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories.PDF [Dataset]. http://doi.org/10.3389/fmolb.2020.00132.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2020.00132.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Vojtěch Spiwok; Pavel Kříž
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Molecular simulation trajectories represent high-dimensional data. Such data can be visualized by methods of dimensionality reduction. Non-linear dimensionality reduction methods are likely to be more efficient than linear ones due to the fact that motions of atoms are non-linear. Here we test a popular non-linear t-distributed Stochastic Neighbor Embedding (t-SNE) method on analysis of trajectories of 200 ns alanine dipeptide dynamics and 208 μs Trp-cage folding and unfolding. Furthermore, we introduce a time-lagged variant of t-SNE in order to focus on rarely occurring transitions in the molecular system. This time-lagged t-SNE efficiently separates states according to distance in time. Using this method it is possible to visualize key states of studied systems (e.g., unfolded and folded protein) as well as possible kinetic traps using a two-dimensional plot. Time-lagged t-SNE is a visualization method and other applications, such as clustering and free energy modeling, must be done with caution.
d
Replication Data for: Continuous Distributed Representation of Biological...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asgari, Ehsaneddin (2023). Replication Data for: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [Dataset]. http://doi.org/10.7910/DVN/JMFHTN
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/JMFHTN
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Asgari, Ehsaneddin
Description
Users should cite: Asgari E, Mofrad MRK. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE 10(11): e0141287. doi:10.1371/journal.pone.0141287. This archive also contains the family classification data that we used in the above mentioned PLoS ONE paper. This data can be used as a benchmark for family classification task.
f
Data_Sheet_1_Quantitative Comparison of Conventional and t-SNE-guided Gating...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shadi Toghi Eshghi; Amelia Au-Yeung; Chikara Takahashi; Christopher R. Bolen; Maclean N. Nyachienga; Sean P. Lear; Cherie Green; W. Rodney Mathews; William E. O'Gorman (2023). Data_Sheet_1_Quantitative Comparison of Conventional and t-SNE-guided Gating Analyses.pdf [Dataset]. http://doi.org/10.3389/fimmu.2019.01194.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2019.01194.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Shadi Toghi Eshghi; Amelia Au-Yeung; Chikara Takahashi; Christopher R. Bolen; Maclean N. Nyachienga; Sean P. Lear; Cherie Green; W. Rodney Mathews; William E. O'Gorman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dimensionality reduction using the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as a popular tool for visualizing high-parameter single-cell data. While this approach has obvious potential for data visualization it remains unclear how t-SNE analysis compares to conventional manual hand-gating in stratifying and quantitating the frequency of diverse immune cell populations. We applied a comprehensive 38-parameter mass cytometry panel to human blood and compared the frequencies of 28 immune cell subsets using both conventional bivariate and t-SNE-guided manual gating. t-SNE analysis was capable of stratifying every general cellular lineage and most sub-lineages with high correlation between conventional and t-SNE-guided cell frequency calculations. However, specific immune cell subsets delineated by the manual gating of continuous variables were not fully separated in t-SNE space thus causing discrepancies in subset identification and quantification between these analytical approaches. Overall, these studies highlight the consistency between t-SNE and conventional hand-gating in stratifying general immune cell lineages while demonstrating that particular cell subsets defined by conventional manual gating may be intermingled in t-SNE space.
Dataset name, reference, dimensions and cell type composition.
plos.figshare.com
xls
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuta Hozumi; Guo-Wei Wei (2024). Dataset name, reference, dimensions and cell type composition. [Dataset]. http://doi.org/10.1371/journal.pone.0311791.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311791.t001
Dataset updated
Dec 13, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Yuta Hozumi; Guo-Wei Wei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset name, reference, dimensions and cell type composition.
Data from: Visualizing histopathologic deep learning classification and...
zenodo.org
bin
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis; Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis (2020). Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction [Dataset]. http://doi.org/10.5281/zenodo.1238084
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1238084
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis; Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Representative Testing/Validation WSIs used in the manuscript "Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction"
Additional file 6 of Unravelling population structure heterogeneity within...
figshare.com
xlsx
Updated Jun 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum (2023). Additional file 6 of Unravelling population structure heterogeneity within the genome of the malaria vector Anopheles gambiae [Dataset]. http://doi.org/10.6084/m9.figshare.14752301.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14752301.v1
Dataset updated
Jun 10, 2023
Dataset provided by
figshare
Authors
Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 6: S4 Table. Measure of consistency of the t-SNE plot.
f
Additional file 11 of Unravelling population structure heterogeneity within...
springernature.figshare.com
xlsx
Updated Jun 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum (2023). Additional file 11 of Unravelling population structure heterogeneity within the genome of the malaria vector Anopheles gambiae [Dataset]. http://doi.org/10.6084/m9.figshare.14752283.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14752283.v1
Dataset updated
Jun 10, 2023
Dataset provided by
figshare
Authors
Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 11: S9 Table. GO enrichment analysis.
f
Table_1_Data Mining and Machine Learning Models for Predicting Drug Likeness...
figshare.com
frontiersin.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abraham Yosipof; Rita C. Guedes; Alfonso T. García-Sosa (2023). Table_1_Data Mining and Machine Learning Models for Predicting Drug Likeness and Their Disease or Organ Category.CSV [Dataset]. http://doi.org/10.3389/fchem.2018.00162.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fchem.2018.00162.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Abraham Yosipof; Rita C. Guedes; Alfonso T. García-Sosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.
f
Collected dimension and attribute.
plos.figshare.com
xls
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ching-Hsue Cheng; Ming-Chi Tsai; Yuan-Shao Chang (2023). Collected dimension and attribute. [Dataset]. http://doi.org/10.1371/journal.pone.0290629.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0290629.t001
Dataset updated
Nov 2, 2023
Dataset provided by
PLOS ONE
Authors
Ching-Hsue Cheng; Ming-Chi Tsai; Yuan-Shao Chang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The hotel industry is essential for tourism. With the rapid expansion of the internet, consumers only search for their desired keywords on the website when they trying to find a hotel to stay, causing the relevant hotel information would appear. To quickly respond to the changing market and consumer habits, each hotel must focus on its website information and information quality. This study proposes a novel methodology that uses rough set theory (RST), principal component analysis, t-Distributed Stochastic Neighbor Embedding (t-SNE), and attribute performance visualization to explore the relationship between hotel star ratings and hotel website information quality. The collected data are based on the star-rated hotels of the Taiwanstay website, and the checklists of hotel website services are used to obtain the relevant attributes data. The results show that there are significant differences in information quality between hotels below two stars and those above four stars. The information quality provided by the higher star hotels was more detailed than that offered by low-star hotels. Based on the attribute performance matrix, the one-star and two-star hotels have advantage attributes in their landscape, reply time, restaurant information, social media, and compensation. Furthermore, the three-five star hotels have advantage attributes in their operational support, compensation, restaurant information, traffic information, and room information. These results could be provided to the stakeholders as a reference.
Scripts for Analysis
figshare.com
txt
Updated Jul 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6783569.v2
Dataset updated
Jul 18, 2018
Dataset provided by
figshare
Authors
Sneddon Lab UCSF
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini (2023). Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx [Dataset]. http://doi.org/10.3389/fnhum.2023.1134012.s001

Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx

Explore at:

docxAvailable download formats

Unique identifier

https://doi.org/10.3389/fnhum.2023.1134012.s001

Dataset updated

Jul 11, 2023

Dataset provided by

Frontiers

Authors

Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)—namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies—are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_Manifold learning for fMRI time-varying functional...

Data_Sheet_1_Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE)...

Replication Data for: Continuous Distributed Representation of Biological...

Data_Sheet_1_Quantitative Comparison of Conventional and t-SNE-guided Gating...

Dataset name, reference, dimensions and cell type composition.

Data from: Visualizing histopathologic deep learning classification and...

Additional file 6 of Unravelling population structure heterogeneity within...

Additional file 11 of Unravelling population structure heterogeneity within...

Table_1_Data Mining and Machine Learning Models for Predicting Drug Likeness...

Collected dimension and attribute.

Scripts for Analysis

Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx