11 datasets found
  1. f

    Data_Sheet_1_Manifold learning for fMRI time-varying functional...

    • frontiersin.figshare.com
    docx
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini (2023). Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx [Dataset]. http://doi.org/10.3389/fnhum.2023.1134012.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Frontiers
    Authors
    Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)—namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies—are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

  2. f

    Data_Sheet_1_Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE)...

    • figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vojtěch Spiwok; Pavel Kříž (2023). Data_Sheet_1_Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories.PDF [Dataset]. http://doi.org/10.3389/fmolb.2020.00132.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Vojtěch Spiwok; Pavel Kříž
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Molecular simulation trajectories represent high-dimensional data. Such data can be visualized by methods of dimensionality reduction. Non-linear dimensionality reduction methods are likely to be more efficient than linear ones due to the fact that motions of atoms are non-linear. Here we test a popular non-linear t-distributed Stochastic Neighbor Embedding (t-SNE) method on analysis of trajectories of 200 ns alanine dipeptide dynamics and 208 μs Trp-cage folding and unfolding. Furthermore, we introduce a time-lagged variant of t-SNE in order to focus on rarely occurring transitions in the molecular system. This time-lagged t-SNE efficiently separates states according to distance in time. Using this method it is possible to visualize key states of studied systems (e.g., unfolded and folded protein) as well as possible kinetic traps using a two-dimensional plot. Time-lagged t-SNE is a visualization method and other applications, such as clustering and free energy modeling, must be done with caution.

  3. d

    Replication Data for: Continuous Distributed Representation of Biological...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asgari, Ehsaneddin (2023). Replication Data for: Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [Dataset]. http://doi.org/10.7910/DVN/JMFHTN
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Asgari, Ehsaneddin
    Description

    Users should cite: Asgari E, Mofrad MRK. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE 10(11): e0141287. doi:10.1371/journal.pone.0141287. This archive also contains the family classification data that we used in the above mentioned PLoS ONE paper. This data can be used as a benchmark for family classification task.

  4. f

    Data_Sheet_1_Quantitative Comparison of Conventional and t-SNE-guided Gating...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shadi Toghi Eshghi; Amelia Au-Yeung; Chikara Takahashi; Christopher R. Bolen; Maclean N. Nyachienga; Sean P. Lear; Cherie Green; W. Rodney Mathews; William E. O'Gorman (2023). Data_Sheet_1_Quantitative Comparison of Conventional and t-SNE-guided Gating Analyses.pdf [Dataset]. http://doi.org/10.3389/fimmu.2019.01194.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Shadi Toghi Eshghi; Amelia Au-Yeung; Chikara Takahashi; Christopher R. Bolen; Maclean N. Nyachienga; Sean P. Lear; Cherie Green; W. Rodney Mathews; William E. O'Gorman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dimensionality reduction using the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as a popular tool for visualizing high-parameter single-cell data. While this approach has obvious potential for data visualization it remains unclear how t-SNE analysis compares to conventional manual hand-gating in stratifying and quantitating the frequency of diverse immune cell populations. We applied a comprehensive 38-parameter mass cytometry panel to human blood and compared the frequencies of 28 immune cell subsets using both conventional bivariate and t-SNE-guided manual gating. t-SNE analysis was capable of stratifying every general cellular lineage and most sub-lineages with high correlation between conventional and t-SNE-guided cell frequency calculations. However, specific immune cell subsets delineated by the manual gating of continuous variables were not fully separated in t-SNE space thus causing discrepancies in subset identification and quantification between these analytical approaches. Overall, these studies highlight the consistency between t-SNE and conventional hand-gating in stratifying general immune cell lineages while demonstrating that particular cell subsets defined by conventional manual gating may be intermingled in t-SNE space.

  5. Dataset name, reference, dimensions and cell type composition.

    • plos.figshare.com
    xls
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuta Hozumi; Guo-Wei Wei (2024). Dataset name, reference, dimensions and cell type composition. [Dataset]. http://doi.org/10.1371/journal.pone.0311791.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yuta Hozumi; Guo-Wei Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset name, reference, dimensions and cell type composition.

  6. Data from: Visualizing histopathologic deep learning classification and...

    • zenodo.org
    bin
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis; Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis (2020). Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction [Dataset]. http://doi.org/10.5281/zenodo.1238084
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis; Kevin Faust; Quin Xie; Dominick Han; Kartikay Goyle; Zoya Volynskaya; Ugljesa Djuric; Phedias Diamandis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Representative Testing/Validation WSIs used in the manuscript "Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction"

  7. Additional file 6 of Unravelling population structure heterogeneity within...

    • figshare.com
    xlsx
    Updated Jun 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum (2023). Additional file 6 of Unravelling population structure heterogeneity within the genome of the malaria vector Anopheles gambiae [Dataset]. http://doi.org/10.6084/m9.figshare.14752301.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    figshare
    Authors
    Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 6: S4 Table. Measure of consistency of the t-SNE plot.

  8. f

    Additional file 11 of Unravelling population structure heterogeneity within...

    • springernature.figshare.com
    xlsx
    Updated Jun 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum (2023). Additional file 11 of Unravelling population structure heterogeneity within the genome of the malaria vector Anopheles gambiae [Dataset]. http://doi.org/10.6084/m9.figshare.14752283.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    figshare
    Authors
    Melina Campos; Luisa D. P. Rona; Katie Willis; George K. Christophides; Robert M. MacCallum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 11: S9 Table. GO enrichment analysis.

  9. f

    Table_1_Data Mining and Machine Learning Models for Predicting Drug Likeness...

    • figshare.com
    • frontiersin.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abraham Yosipof; Rita C. Guedes; Alfonso T. García-Sosa (2023). Table_1_Data Mining and Machine Learning Models for Predicting Drug Likeness and Their Disease or Organ Category.CSV [Dataset]. http://doi.org/10.3389/fchem.2018.00162.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Abraham Yosipof; Rita C. Guedes; Alfonso T. García-Sosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data mining approaches can uncover underlying patterns in chemical and pharmacological property space decisive for drug discovery and development. Two of the most common approaches are visualization and machine learning methods. Visualization methods use dimensionality reduction techniques in order to reduce multi-dimension data into 2D or 3D representations with a minimal loss of information. Machine learning attempts to find correlations between specific activities or classifications for a set of compounds and their features by means of recurring mathematical models. Both models take advantage of the different and deep relationships that can exist between features of compounds, and helpfully provide classification of compounds based on such features or in case of visualization methods uncover underlying patterns in the feature space. Drug-likeness has been studied from several viewpoints, but here we provide the first implementation in chemoinformatics of the t-Distributed Stochastic Neighbor Embedding (t-SNE) method for the visualization and the representation of chemical space, and the use of different machine learning methods separately and together to form a new ensemble learning method called AL Boost. The models obtained from AL Boost synergistically combine decision tree, random forests (RF), support vector machine (SVM), artificial neural network (ANN), k nearest neighbors (kNN), and logistic regression models. In this work, we show that together they form a predictive model that not only improves the predictive force but also decreases bias. This resulted in a corrected classification rate of over 0.81, as well as higher sensitivity and specificity rates for the models. In addition, separation and good models were also achieved for disease categories such as antineoplastic compounds and nervous system diseases, among others. Such models can be used to guide decision on the feature landscape of compounds and their likeness to either drugs or other characteristics, such as specific or multiple disease-category(ies) or organ(s) of action of a molecule.

  10. f

    Collected dimension and attribute.

    • plos.figshare.com
    xls
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ching-Hsue Cheng; Ming-Chi Tsai; Yuan-Shao Chang (2023). Collected dimension and attribute. [Dataset]. http://doi.org/10.1371/journal.pone.0290629.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ching-Hsue Cheng; Ming-Chi Tsai; Yuan-Shao Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The hotel industry is essential for tourism. With the rapid expansion of the internet, consumers only search for their desired keywords on the website when they trying to find a hotel to stay, causing the relevant hotel information would appear. To quickly respond to the changing market and consumer habits, each hotel must focus on its website information and information quality. This study proposes a novel methodology that uses rough set theory (RST), principal component analysis, t-Distributed Stochastic Neighbor Embedding (t-SNE), and attribute performance visualization to explore the relationship between hotel star ratings and hotel website information quality. The collected data are based on the star-rated hotels of the Taiwanstay website, and the checklists of hotel website services are used to obtain the relevant attributes data. The results show that there are significant differences in information quality between hotels below two stars and those above four stars. The information quality provided by the higher star hotels was more detailed than that offered by low-star hotels. Based on the attribute performance matrix, the one-star and two-star hotels have advantage attributes in their landscape, reply time, restaurant information, social media, and compensation. Furthermore, the three-five star hotels have advantage attributes in their operational support, compensation, restaurant information, traffic information, and room information. These results could be provided to the stakeholders as a reference.

  11. Scripts for Analysis

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    figshare
    Authors
    Sneddon Lab UCSF
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini (2023). Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx [Dataset]. http://doi.org/10.3389/fnhum.2023.1134012.s001

Data_Sheet_1_Manifold learning for fMRI time-varying functional connectivity.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Jul 11, 2023
Dataset provided by
Frontiers
Authors
Javier Gonzalez-Castillo; Isabel S. Fernandez; Ka Chun Lam; Daniel A. Handwerker; Francisco Pereira; Peter A. Bandettini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)—namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies—are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.

Search
Clear search
Close search
Google apps
Main menu