9 datasets found
  1. Data from: Fig7

    • figshare.com
    pdf
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    heyong luo (2022). Fig7 [Dataset]. http://doi.org/10.6084/m9.figshare.21440166.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    heyong luo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heatmap of transcription levels of identified DEGs associated with NK cell cytotoxicity. Scatter plot displaying genes related to NK cell cytotoxicity. Scatter plot displaying genes related to signaling pathways.Heatmaps displaying FKPM values. Line graphs showing the transcriptional changes in genes related to activation, migration or tumor killing activity of NK cells.

  2. Additional file 1 of ChromoMap: an R package for interactive visualization...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lakshay Anand; Carlos M. Rodriguez Lopez (2023). Additional file 1 of ChromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes [Dataset]. http://doi.org/10.6084/m9.figshare.18230845.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lakshay Anand; Carlos M. Rodriguez Lopez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. Example of chromoMap interactive plot constructed using various features of chromoMap including polyploidy (used as multi-track), feature-associated data visualization (scatter and bar plots), chromosome heatmaps, data filters (color-coded scatter and bars). Differential gene expression in a cohort of patients positive for COVID19 and healthy individuals (NCBI Gene Expression Omnibus id: GSE162835) [12]. Each set of five tracks labeled with the same chromosome ID (e.g. 1-22, X & Y) contains the following information: From top to bottom: (1) number of differentially expressed genes (DEGs) (FDR < 0.05) (bars over the chromosome depictions) per genomic window (green boxes within the chromosome). Windows containing ≥ 5 DEGs are shown in yellow. (2) DEGs (FDR < 0.05) between healthy individuals and patients positive for COVID19 visualized as a scatterplot above the chromosome depiction (genes with logFC ≥ 2 or logFC ≤ −2 are highlighted in orange). Dots above the grey dashed line represent upregulated genes in COVID19 positive patients. Heatmap within chromosome depictions indicates the average LogFC value per window. (3–4) Normalized expression of differentially expressed genes (scatterplot) and of each genomic window containing DEG (green scale heatmap) in (3) patients with severe/critical outcomes and (4) asymptomatic/mild outcome patients. (5) logFC of DEGs between healthy individuals and patients positive for COVID19 visualized as scatter plot color-coded based on the metabolic pathway each DEG belongs to.

  3. Data from: Superheat: An R Package for Creating Beautiful and Extendable...

    • tandf.figshare.com
    bin
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca L. Barter; Bin Yu (2024). Superheat: An R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data [Dataset]. http://doi.org/10.6084/m9.figshare.6287693.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Rebecca L. Barter; Bin Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources.

  4. m

    Datasets and source code for a pipeline architecture for feature-based...

    • data.mendeley.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonatan Enes (2022). Datasets and source code for a pipeline architecture for feature-based unsupervised clustering using multivariate time series from HPC jobs [Dataset]. http://doi.org/10.17632/hgkv9cpnmn.2
    Explore at:
    Dataset updated
    Dec 14, 2022
    Authors
    Jonatan Enes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository is composed of 2 compressed files, with the contents as next described.

    --- code.tar.gz --- The source code that implements the pipeline, as well as code and scripts needed to retrieve time series, create the plots or run the experiments. More specifically:

    + prepare.py and main.py ⇨ 
      The Python programs that implement the pipeline, both the auxiliary and the main pipeline 
      stages, respectively. 
    
    + 'anomaly' and 'config' folders ⇨ 
      Scripts and Python files containing the configuration and some basic functions that are 
      used to retrieve the information needed to process the data, like the actual resource 
      time series from OpenTSDB, or the job metadata from Slurm.
    
    + 'functions' folder ⇨ 
      Several folders with the Python programs that implement all the stages of the pipeline, 
      either for the Machine Learning processing (e.g., extractors, aggregators, models), or 
      the technical aspect of the pipeline (e.g., pipelines, transformer).
    
    + plotDF.py ⇨ 
      A Python program used to create the different plots presented, from the resource time 
      series to the evaluation plots.
    
    + several bash scripts ⇨ 
      Used to run the experiments using a specific configuration, whether regarding which 
      transformers are chosen and how they are parametrized, or more technical aspects 
      involving how the pipeline is executed.
    

    --- data.tar.gz --- The actual data and results, organized as follows:

    + jobs ⇨ 
      All the jobs' resource time series plots for all the experiments, with a folder used 
      for each experiment. Inside each folder all the jobs are separated according to their 
      id, containing the plots for the different system resources (e.g., User CPU, Cached memory).
    
    + plots ⇨ 
      All the predictions' plots for all the experiments in separated folders, mainly used for 
      evaluation purposes (e.g., scatter plot, heatmaps, Andrews curves, dendrograms). These 
      plots are available for all the predictors resulting from the pipeline execution. In 
      addition, for each predictor it is also possible to visualize the resource time series 
      grouped by clusters. Finally, the projections as generated by the dimension reduction 
      models, and the outliers detected, are also available for each experiment.
    
    + datasets ⇨ 
      The datasets used for the experiments, which include the lists of job IDs to be processed 
      (CSV files) and the results of each stage of the pipeline (e.g., features, predictions), 
      and the output text files as generated by several pipeline stages. Among these latter 
      files it is worth to note the evaluation ones, that include all the predictions scores.
    
  5. GO ENRICHMENT ANALYSIS OF CLOCK GENES USING DIFFERENTIALLY METHYLATED...

    • figshare.com
    txt
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eamonn Mallon (2025). GO ENRICHMENT ANALYSIS OF CLOCK GENES USING DIFFERENTIALLY METHYLATED BACKGROUND [Dataset]. http://doi.org/10.6084/m9.figshare.29107943.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Eamonn Mallon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUMMARYThis script performs Gene Ontology (GO) enrichment analysis on a set of clock-related CpGs in Nasonia vitripennis, using a background of differentially methylated genes. It identifies over-represented GO terms, applies FDR correction, and visualizes significant terms using semantic similarity metrics.ORIGINAdapted from code by Alun Jones (see Bebane et al., 2019).KEY STEPS1. Load GO annotations for the background gene set (differentially methylated genes).2. Create GOFrame and GeneSetCollection objects compatible with GOstats.3. Load a user-defined list of clock genes.4. Filter gene list to those with GO annotations.5. Run a hypergeometric test for enrichment across BP, CC, and MF ontologies.6. Apply FDR correction (Benjamini-Hochberg).7. Visualize enriched Biological Process terms using: - Treemap - Scatter plot - Heatmap - Word cloudINPUT FILES- diff_backgroundGOannotations.csv (GO annotations for differentially methylated genes)- clock_genes.csv (list of clock-related CpGs)OUTPUT FILES- supplementary_tables_pnas.xlsx → Sheet: Table_S4_GOterms (FDR-filtered GO terms)- diff_erin_methylated_clock_genes_GO_treemap.png → Treemap of reduced GO termsSOFTWARE REQUIREMENTS- R packages: GOstats, GSEABase, treemap, readr, dplyr, rrvgo, openxlsx, org.Dm.eg.dbNOTES- Uses Drosophila GO database for semantic similarity.- Focuses on over-representation in the Biological Process ontology.CITATIONBebane, P. et al. (2019). "Neonics and bumblebees." [Insert DOI]CONTACTEamonn Mallonebm3@le.ac.uk

  6. E-commerce Sales Prediction Dataset

    • kaggle.com
    zip
    Updated Dec 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nevil Dhinoja (2024). E-commerce Sales Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/nevildhinoja/e-commerce-sales-prediction-dataset/discussion
    Explore at:
    zip(16700 bytes)Available download formats
    Dataset updated
    Dec 14, 2024
    Authors
    Nevil Dhinoja
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    E-commerce Sales Prediction Dataset

    This repository contains a comprehensive and clean dataset for predicting e-commerce sales, tailored for data scientists, machine learning enthusiasts, and researchers. The dataset is crafted to analyze sales trends, optimize pricing strategies, and develop predictive models for sales forecasting.

    📂 Dataset Overview

    The dataset includes 1,000 records across the following features:

    Column NameDescription
    DateThe date of the sale (01-01-2023 onward).
    Product_CategoryCategory of the product (e.g., Electronics, Sports, Other).
    PricePrice of the product (numerical).
    DiscountDiscount applied to the product (numerical).
    Customer_SegmentBuyer segment (e.g., Regular, Occasional, Other).
    Marketing_SpendMarketing budget allocated for sales (numerical).
    Units_SoldNumber of units sold per transaction (numerical).

    📊 Data Summary

    General Properties

    Date: - Range: 01-01-2023 to 12-31-2023. - Contains 1,000 unique values without missing data.

    Product_Category: - Categories: Electronics (21%), Sports (21%), Other (58%). - Most common category: Electronics (21%).

    Price: - Range: From 244 to 999. - Mean: 505, Standard Deviation: 290. - Most common price range: 14.59 - 113.07.

    Discount: - Range: From 0.01% to 49.92%. - Mean: 24.9%, Standard Deviation: 14.4%. - Most common discount range: 0.01 - 5.00%.

    Customer_Segment: - Segments: Regular (35%), Occasional (34%), Other (31%). - Most common segment: Regular.

    Marketing_Spend: - Range: From 2.41k to 10k. - Mean: 4.91k, Standard Deviation: 2.84k.

    Units_Sold: - Range: From 5 to 57. - Mean: 29.6, Standard Deviation: 7.26. - Most common range: 24 - 34 units sold.

    📈 Data Visualizations

    The dataset is suitable for creating the following visualizations: - 1. Price Distribution: Histogram to show the spread of prices. - 2. Discount Distribution: Histogram to analyze promotional offers. - 3. Marketing Spend Distribution: Histogram to understand marketing investment patterns. - 4. Customer Segment Distribution: Bar plot of customer segments. - 5. Price vs Units Sold: Scatter plot to show pricing effects on sales. - 6. Discount vs Units Sold: Scatter plot to explore the impact of discounts. - 7. Marketing Spend vs Units Sold: Scatter plot for marketing effectiveness. - 8. Correlation Heatmap: Identify relationships between features. - 9. Pairplot: Visualize pairwise feature interactions.

    💡 How the Data Was Created

    The dataset is synthetically generated to mimic realistic e-commerce sales trends. Below are the steps taken for data generation:

    1. Feature Engineering:

      • Identified key attributes such as product category, price, discount, and marketing spend, typically observed in e-commerce data.
      • Generated dependent features like units sold based on logical relationships.
    2. Data Simulation:

      • Python Libraries: Used NumPy and Pandas to generate and distribute values.
      • Statistical Modeling: Ensured feature distributions aligned with real-world sales data patterns.
    3. Validation:

      • Verified data consistency with no missing or invalid values.
      • Ensured logical correlations (e.g., higher discounts → increased units sold).

    Note: The dataset is synthetic and not sourced from any real-world e-commerce platform.

    🛠 Example Usage: Sales Prediction Model

    Here’s an example of building a predictive model using Linear Regression:

    Written in python

    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    
    # Load the dataset
    df = pd.read_csv('ecommerce_sales.csv')
    
    # Feature selection
    X = df[['Price', 'Discount', 'Marketing_Spend']]
    y = df['Units_Sold']
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Model training
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Predictions
    y_pred = model.predict(X_test)
    
    # Evaluation
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    print(f'Mean Squared Error: {mse:.2f}')
    print(f'R-squared: {r2:.2f}')
    
  7. Data from: Figure 3

    • figshare.com
    application/csv
    Updated Mar 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min Kyung Lee (2024). Figure 3 [Dataset]. http://doi.org/10.6084/m9.figshare.25236976.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Mar 24, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Min Kyung Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figure 3. Tumor type-specific presence of cell types. Heatmap of the proportions (%) of each cell type present in each sample. Scatter plot on the left of the heatmap indicates the median stemness level of each cell type. Horizontal tracking bars indicate the tumor type and grade of each sample. Vertical tracking bars indicate the major cell types of the nuclei. The cell types with greater than 5% are labeled within each cell. ATC: Astrocytoma; EMB: Embryonal tumors; EPN: Ependymoma; GBM: Glioblastoma; GNN: Glioneuronal/neuronal tumors; NT: Non-tumor; SCH: Schwannoma.

  8. Theatres_Data_Version_1.0

    • kaggle.com
    zip
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deanesh takkallapati (2025). Theatres_Data_Version_1.0 [Dataset]. https://www.kaggle.com/datasets/deaneshtakkallapati/theatres-data-version-1-0/data
    Explore at:
    zip(30479 bytes)Available download formats
    Dataset updated
    Aug 4, 2025
    Authors
    deanesh takkallapati
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Exploratory Data Analysis (EDA) of Theatres Data in India

    Steps
      1. Load Data
      2. Check Nulls and Update Data if required
      3. Perform Descriptive Statistics
      4. Data Visualization
         Univariate - Single Column Visualization
           categorical - countplot
           continuous - histogram
         Bivariate - 2 Columns Visualization
          continuous vs continuous  - scatterplot, regplot
          categorical vs continuous  - boxplot
          categorical vs categorical - crosstab, heatmap
         Multivariate - Multi Columns Visualization
          correlation plot
          pairplot
    
  9. Additional file 5 of The cuproptosis-related signature predicts the...

    • springernature.figshare.com
    zip
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Chang; Yihan Wu; Xiaodong Niu; Zhiwei Guo; Jiahao Gan; Xiang Wang; Yanhui Liu; Qi Pan; Qing Mao; Yuan Yang (2024). Additional file 5 of The cuproptosis-related signature predicts the prognosis and immune microenvironments of primary diffuse gliomas: a comprehensive analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26742221.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 18, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tao Chang; Yihan Wu; Xiaodong Niu; Zhiwei Guo; Jiahao Gan; Xiang Wang; Yanhui Liu; Qi Pan; Qing Mao; Yuan Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 5: Fig. S5. Analysis of tumor mutation, TIDE, immune checkpoints of tumor signature genes in gliomas. A A scatter plot showing TMB was positively correlated with the CRG score. B The characteristics of the top 10 most frequently mutated genes and variant classification. Scatter plot showing the correlation of TIDE, dysfunction, exclusion, and MSI with immune (C-F) and CRG (G-J) scores. K Analysis of immune checkpoints between CRG risk subgroups. L Heatmap illustrating the relationships among CRG risk subgroups, clinical profiles, and 22 types of immune cells. p < 0.05;p < 0.01;**p < 0.001.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
heyong luo (2022). Fig7 [Dataset]. http://doi.org/10.6084/m9.figshare.21440166.v1
Organization logoOrganization logo

Data from: Fig7

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Oct 31, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
heyong luo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Heatmap of transcription levels of identified DEGs associated with NK cell cytotoxicity. Scatter plot displaying genes related to NK cell cytotoxicity. Scatter plot displaying genes related to signaling pathways.Heatmaps displaying FKPM values. Line graphs showing the transcriptional changes in genes related to activation, migration or tumor killing activity of NK cells.

Search
Clear search
Close search
Google apps
Main menu