7 datasets found

Data from: Fig7
figshare.com
pdf
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
heyong luo (2022). Fig7 [Dataset]. http://doi.org/10.6084/m9.figshare.21440166.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21440166.v1
Dataset updated
Oct 31, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
heyong luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heatmap of transcription levels of identified DEGs associated with NK cell cytotoxicity. Scatter plot displaying genes related to NK cell cytotoxicity. Scatter plot displaying genes related to signaling pathways.Heatmaps displaying FKPM values. Line graphs showing the transcriptional changes in genes related to activation, migration or tumor killing activity of NK cells.
Additional file 1 of ChromoMap: an R package for interactive visualization...
springernature.figshare.com
html
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lakshay Anand; Carlos M. Rodriguez Lopez (2023). Additional file 1 of ChromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes [Dataset]. http://doi.org/10.6084/m9.figshare.18230845.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.18230845.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Lakshay Anand; Carlos M. Rodriguez Lopez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1. Example of chromoMap interactive plot constructed using various features of chromoMap including polyploidy (used as multi-track), feature-associated data visualization (scatter and bar plots), chromosome heatmaps, data filters (color-coded scatter and bars). Differential gene expression in a cohort of patients positive for COVID19 and healthy individuals (NCBI Gene Expression Omnibus id: GSE162835) [12]. Each set of five tracks labeled with the same chromosome ID (e.g. 1-22, X & Y) contains the following information: From top to bottom: (1) number of differentially expressed genes (DEGs) (FDR < 0.05) (bars over the chromosome depictions) per genomic window (green boxes within the chromosome). Windows containing ≥ 5 DEGs are shown in yellow. (2) DEGs (FDR < 0.05) between healthy individuals and patients positive for COVID19 visualized as a scatterplot above the chromosome depiction (genes with logFC ≥ 2 or logFC ≤ −2 are highlighted in orange). Dots above the grey dashed line represent upregulated genes in COVID19 positive patients. Heatmap within chromosome depictions indicates the average LogFC value per window. (3–4) Normalized expression of differentially expressed genes (scatterplot) and of each genomic window containing DEG (green scale heatmap) in (3) patients with severe/critical outcomes and (4) asymptomatic/mild outcome patients. (5) logFC of DEGs between healthy individuals and patients positive for COVID19 visualized as scatter plot color-coded based on the metabolic pathway each DEG belongs to.
f
GO ENRICHMENT ANALYSIS OF CLOCK GENES USING DIFFERENTIALLY METHYLATED...
figshare.com
txt
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eamonn Mallon (2025). GO ENRICHMENT ANALYSIS OF CLOCK GENES USING DIFFERENTIALLY METHYLATED BACKGROUND [Dataset]. http://doi.org/10.6084/m9.figshare.29107943.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29107943.v1
Dataset updated
May 20, 2025
Dataset provided by
figshare
Authors
Eamonn Mallon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SUMMARYThis script performs Gene Ontology (GO) enrichment analysis on a set of clock-related CpGs in Nasonia vitripennis, using a background of differentially methylated genes. It identifies over-represented GO terms, applies FDR correction, and visualizes significant terms using semantic similarity metrics.ORIGINAdapted from code by Alun Jones (see Bebane et al., 2019).KEY STEPS1. Load GO annotations for the background gene set (differentially methylated genes).2. Create GOFrame and GeneSetCollection objects compatible with GOstats.3. Load a user-defined list of clock genes.4. Filter gene list to those with GO annotations.5. Run a hypergeometric test for enrichment across BP, CC, and MF ontologies.6. Apply FDR correction (Benjamini-Hochberg).7. Visualize enriched Biological Process terms using: - Treemap - Scatter plot - Heatmap - Word cloudINPUT FILES- diff_backgroundGOannotations.csv (GO annotations for differentially methylated genes)- clock_genes.csv (list of clock-related CpGs)OUTPUT FILES- supplementary_tables_pnas.xlsx → Sheet: Table_S4_GOterms (FDR-filtered GO terms)- diff_erin_methylated_clock_genes_GO_treemap.png → Treemap of reduced GO termsSOFTWARE REQUIREMENTS- R packages: GOstats, GSEABase, treemap, readr, dplyr, rrvgo, openxlsx, org.Dm.eg.dbNOTES- Uses Drosophila GO database for semantic similarity.- Focuses on over-representation in the Biological Process ontology.CITATIONBebane, P. et al. (2019). "Neonics and bumblebees." [Insert DOI]CONTACTEamonn Mallonebm3@le.ac.uk

E-commerce Sales Prediction Dataset

kaggle.com

Updated Dec 14, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Nevil Dhinoja (2024). E-commerce Sales Prediction Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10197264

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/10197264

Dataset updated

Dec 14, 2024

Dataset provided by

Kaggle

Authors

Nevil Dhinoja

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

E-commerce Sales Prediction Dataset

This repository contains a comprehensive and clean dataset for predicting e-commerce sales, tailored for data scientists, machine learning enthusiasts, and researchers. The dataset is crafted to analyze sales trends, optimize pricing strategies, and develop predictive models for sales forecasting.

📂 Dataset Overview

The dataset includes 1,000 records across the following features:

Column Name	Description
Date	The date of the sale (01-01-2023 onward).
Product_Category	Category of the product (e.g., Electronics, Sports, Other).
Price	Price of the product (numerical).
Discount	Discount applied to the product (numerical).
Customer_Segment	Buyer segment (e.g., Regular, Occasional, Other).
Marketing_Spend	Marketing budget allocated for sales (numerical).
Units_Sold	Number of units sold per transaction (numerical).

📊 Data Summary

General Properties

Date: - Range: 01-01-2023 to 12-31-2023. - Contains 1,000 unique values without missing data.

Product_Category: - Categories: Electronics (21%), Sports (21%), Other (58%). - Most common category: Electronics (21%).

Price: - Range: From 244 to 999. - Mean: 505, Standard Deviation: 290. - Most common price range: 14.59 - 113.07.

Discount: - Range: From 0.01% to 49.92%. - Mean: 24.9%, Standard Deviation: 14.4%. - Most common discount range: 0.01 - 5.00%.

Customer_Segment: - Segments: Regular (35%), Occasional (34%), Other (31%). - Most common segment: Regular.

Marketing_Spend: - Range: From 2.41k to 10k. - Mean: 4.91k, Standard Deviation: 2.84k.

Units_Sold: - Range: From 5 to 57. - Mean: 29.6, Standard Deviation: 7.26. - Most common range: 24 - 34 units sold.

📈 Data Visualizations

The dataset is suitable for creating the following visualizations: - 1. Price Distribution: Histogram to show the spread of prices. - 2. Discount Distribution: Histogram to analyze promotional offers. - 3. Marketing Spend Distribution: Histogram to understand marketing investment patterns. - 4. Customer Segment Distribution: Bar plot of customer segments. - 5. Price vs Units Sold: Scatter plot to show pricing effects on sales. - 6. Discount vs Units Sold: Scatter plot to explore the impact of discounts. - 7. Marketing Spend vs Units Sold: Scatter plot for marketing effectiveness. - 8. Correlation Heatmap: Identify relationships between features. - 9. Pairplot: Visualize pairwise feature interactions.

💡 How the Data Was Created

The dataset is synthetically generated to mimic realistic e-commerce sales trends. Below are the steps taken for data generation:

Feature Engineering:
- Identified key attributes such as product category, price, discount, and marketing spend, typically observed in e-commerce data.
- Generated dependent features like units sold based on logical relationships.
Data Simulation:
- Python Libraries: Used NumPy and Pandas to generate and distribute values.
- Statistical Modeling: Ensured feature distributions aligned with real-world sales data patterns.
Validation:
- Verified data consistency with no missing or invalid values.
- Ensured logical correlations (e.g., higher discounts → increased units sold).

Note: The dataset is synthetic and not sourced from any real-world e-commerce platform.

🛠 Example Usage: Sales Prediction Model

Here’s an example of building a predictive model using Linear Regression:

Written in python

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
df = pd.read_csv('ecommerce_sales.csv')

# Feature selection
X = df[['Price', 'Discount', 'Marketing_Spend']]
y = df['Units_Sold']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')

f
Data_Sheet_1_CottonGVD: A Comprehensive Genomic Variation Database for...
frontiersin.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhen Peng; Hongge Li; Gaofei Sun; Panhong Dai; Xiaoli Geng; Xiao Wang; Xiaomeng Zhang; Zhengzhen Wang; Yinhua Jia; Zhaoe Pan; Baojun Chen; Xiongming Du; Shoupu He (2023). Data_Sheet_1_CottonGVD: A Comprehensive Genomic Variation Database for Cultivated Cottons.PDF [Dataset]. http://doi.org/10.3389/fpls.2021.803736.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2021.803736.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Zhen Peng; Hongge Li; Gaofei Sun; Panhong Dai; Xiaoli Geng; Xiao Wang; Xiaomeng Zhang; Zhengzhen Wang; Yinhua Jia; Zhaoe Pan; Baojun Chen; Xiongming Du; Shoupu He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cultivated cottons are the most important economic crop, which produce natural fiber for the textile industry. In recent years, the genetic basis of several essential traits for cultivated cottons has been gradually elucidated by decoding their genomic variations. Although an abundance of resequencing data is available in public, there is still a lack of a comprehensive tool to exhibit the results of genomic variations and genome-wide association study (GWAS). To assist cotton researchers in utilizing these data efficiently and conveniently, we constructed the cotton genomic variation database (CottonGVD; http://120.78.174.209/ or http://db.cngb.org/cottonGVD). This database contains the published genomic information of three cultivated cotton species, the corresponding population variations (SNP and InDel markers), and the visualized results of GWAS for major traits. Various built-in genomic tools help users retrieve, browse, and query the variations conveniently. The database also provides interactive maps (e.g., Manhattan map, scatter plot, heatmap, and linkage disequilibrium block) to exhibit GWAS and expression GWAS results. Cotton researchers could easily focus on phenotype-associated loci visualization, and they are interested in and screen for candidate genes. Moreover, CottonGVD will continue to update by adding more data and functions.
m
Calculations dataset of diatomic systems based on van der Waals density...
data.mendeley.com
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kiyou Shibata (2021). Calculations dataset of diatomic systems based on van der Waals density functional method [Dataset]. http://doi.org/10.17632/yz5rrmvrgd.1
Explore at:
Unique identifier
https://doi.org/10.17632/yz5rrmvrgd.1
Dataset updated
Feb 12, 2021
Authors
Kiyou Shibata
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides results obtained by first-principles calculations on diatomic systems and isolated systems based on SCAN+rVV10. All diatomic systems containing atomic species from H (Z=1) to Ra (Z=88) are considered. Calculations not only for diatomic systems but also for isolated systems are uploaded for evaluating binding energy, .

===========================

raw_vasp_output_files [zip files (diatomic_db_raw.zip, isolated_db_raw.zip)] These zip files contain raw output files (OUTCAR and vasprun.xml) of VASP calculations.

===========================

parsed_dataset [Python pickle files (diatomic_df.pickle, isolated_df.pickle) and csv files (diatomic_df.csv, isolated_df.csv)] These files contain tables of typical physical values files obtained from the VASP calculations. The python pickle files requires python environment with pandas and pymatgen. Files "*_df.pickle" and "*_df_protocol3.pickle" contains the same data, but they were saved with python pickle protocol 5 and 3, respectively.

===========================

codes [diatomic_parser.zip] Simple python scripts for parsing raw VASP output files and plotting heatmaps and a scatter plot.
Data from: Figure 3
figshare.com
application/csv
Updated Mar 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Min Kyung Lee (2024). Figure 3 [Dataset]. http://doi.org/10.6084/m9.figshare.25236976.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25236976.v1
Dataset updated
Mar 24, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Min Kyung Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figure 3. Tumor type-specific presence of cell types. Heatmap of the proportions (%) of each cell type present in each sample. Scatter plot on the left of the heatmap indicates the median stemness level of each cell type. Horizontal tracking bars indicate the tumor type and grade of each sample. Vertical tracking bars indicate the major cell types of the nuclei. The cell types with greater than 5% are labeled within each cell. ATC: Astrocytoma; EMB: Embryonal tumors; EPN: Ependymoma; GBM: Glioblastoma; GNN: Glioneuronal/neuronal tumors; NT: Non-tumor; SCH: Schwannoma.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

heyong luo (2022). Fig7 [Dataset]. http://doi.org/10.6084/m9.figshare.21440166.v1

Data from: Fig7

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.21440166.v1

Dataset updated

Oct 31, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

heyong luo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Heatmap of transcription levels of identified DEGs associated with NK cell cytotoxicity. Scatter plot displaying genes related to NK cell cytotoxicity. Scatter plot displaying genes related to signaling pathways.Heatmaps displaying FKPM values. Line graphs showing the transcriptional changes in genes related to activation, migration or tumor killing activity of NK cells.

Clear search

Close search

Google apps

Main menu

Data from: Fig7

Additional file 1 of ChromoMap: an R package for interactive visualization...

GO ENRICHMENT ANALYSIS OF CLOCK GENES USING DIFFERENTIALLY METHYLATED...

E-commerce Sales Prediction Dataset

E-commerce Sales Prediction Dataset

📂 Dataset Overview

📊 Data Summary

General Properties

📈 Data Visualizations

💡 How the Data Was Created

🛠 Example Usage: Sales Prediction Model

Written in python

Data_Sheet_1_CottonGVD: A Comprehensive Genomic Variation Database for...

Calculations dataset of diatomic systems based on van der Waals density...

Data from: Figure 3

Data from: Fig7