20 datasets found

box-plot-data
kaggle.com
zip
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Almitamy (2024). box-plot-data [Dataset]. https://www.kaggle.com/datasets/mustafaalmitamy/box-plot-data
Explore at:
zip(7450 bytes)Available download formats
Dataset updated
Mar 14, 2024
Authors
Mustafa Almitamy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Mustafa Almitamy

Released under Apache 2.0

Contents
Datasets-Box-Plot
kaggle.com
zip
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Almitamy (2024). Datasets-Box-Plot [Dataset]. https://www.kaggle.com/mustafaalmitamy/datasets-box-plot
Explore at:
zip(220 bytes)Available download formats
Dataset updated
Mar 14, 2024
Authors
Mustafa Almitamy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Mustafa Almitamy

Released under Apache 2.0

Contents
Box plot outlier
kaggle.com
zip
Updated Jul 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chiragksharma (2022). Box plot outlier [Dataset]. https://www.kaggle.com/datasets/chiragksharma/box-plot-outlier
Explore at:
zip(12695 bytes)Available download formats
Dataset updated
Jul 17, 2022
Authors
chiragksharma
Description
Dataset

This dataset was created by chiragksharma

Contents
Box Plot Outliers
kaggle.com
zip
Updated Jul 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chiragksharma (2022). Box Plot Outliers [Dataset]. https://www.kaggle.com/datasets/chiragksharma/box-plot-outliers
Explore at:
zip(12695 bytes)Available download formats
Dataset updated
Jul 17, 2022
Authors
chiragksharma
Description
Dataset

This dataset was created by chiragksharma

Contents
Test-box-plots
kaggle.com
zip
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
srinidhi yerabati (2023). Test-box-plots [Dataset]. https://www.kaggle.com/datasets/srinidhiyerabati/test-box-plots
Explore at:
zip(292 bytes)Available download formats
Dataset updated
Jun 30, 2023
Authors
srinidhi yerabati
Description
Dataset

This dataset was created by srinidhi yerabati

Contents
box plot
kaggle.com
zip
Updated Oct 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bro Brother Crony420 (2021). box plot [Dataset]. https://www.kaggle.com/brobrothercrony420/box-plot
Explore at:
zip(577 bytes)Available download formats
Dataset updated
Oct 28, 2021
Authors
Bro Brother Crony420
Description
Dataset

This dataset was created by Bro Brother Crony420

Contents
Box plot
kaggle.com
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumbal Wahid (2024). Box plot [Dataset]. https://www.kaggle.com/datasets/sumbalwahid/box-plot/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sumbal Wahid
Description
Dataset

This dataset was created by Sumbal Wahid

Released under Other (specified in description)

Contents
Boxplot
kaggle.com
zip
Updated Apr 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
İlyas Abbasov (2022). Boxplot [Dataset]. https://www.kaggle.com/datasets/lyasabbasov/boxplot
Explore at:
zip(1172721 bytes)Available download formats
Dataset updated
Apr 2, 2022
Authors
İlyas Abbasov
Description
Dataset

This dataset was created by İlyas Abbasov

Contents
boxplot
kaggle.com
zip
Updated Apr 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David C (2020). boxplot [Dataset]. https://www.kaggle.com/davidchen1998/boxplot
Explore at:
zip(50253 bytes)Available download formats
Dataset updated
Apr 27, 2020
Authors
David C
Description
Dataset

This dataset was created by David C

Contents
akash box plot
kaggle.com
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Roy (2024). akash box plot [Dataset]. https://www.kaggle.com/datasets/akashcodee/akash-box-plot/suggestions
Explore at:
zip(15234 bytes)Available download formats
Dataset updated
Aug 27, 2024
Authors
Akash Roy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Akash Roy

Released under Apache 2.0

Contents
non-itp-hb-boxplot
kaggle.com
zip
Updated Jun 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ARITRA BRAHMA (2021). non-itp-hb-boxplot [Dataset]. https://www.kaggle.com/aritrabrahma/nonitphbboxplot
Explore at:
zip(536 bytes)Available download formats
Dataset updated
Jun 3, 2021
Authors
ARITRA BRAHMA
Description
Dataset

This dataset was created by ARITRA BRAHMA

Contents
BoxPlot_All_LM
kaggle.com
zip
Updated Dec 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hadeer Khaled Nabil (2022). BoxPlot_All_LM [Dataset]. https://www.kaggle.com/datasets/hadeerkhalednabil/boxplot-all-lm/code
Explore at:
zip(12072 bytes)Available download formats
Dataset updated
Dec 26, 2022
Authors
Hadeer Khaled Nabil
Description
Dataset

This dataset was created by Hadeer Khaled Nabil

Contents
Gráfico_Boxplot
kaggle.com
zip
Updated Dec 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Diego (2022). Gráfico_Boxplot [Dataset]. https://www.kaggle.com/alandiego/grfico-boxplot
Explore at:
zip(186690 bytes)Available download formats
Dataset updated
Dec 8, 2022
Authors
Alan Diego
Description
Dataset

This dataset was created by Alan Diego

Contents

Plotly Dashboard Healthcare

kaggle.com

zip

Updated Jan 4, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

A SURESH (2022). Plotly Dashboard Healthcare [Dataset]. https://www.kaggle.com/datasets/sureshmecad/plotly-dashboard-healthcare

Explore at:

zip(1741234 bytes)Available download formats

Dataset updated

Jan 4, 2022

Authors

A SURESH

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Data Visualization

Content

a. Scatter plot

  i. The webapp should allow the user to select genes from datasets and plot 2D scatter plots between 2 variables(expression/copy_number/chronos) for 
    any pair of genes.

  ii. The user should be able to filter and color data points using metadata information available in the file “metadata.csv”.

  iii. The visualization could be interactive - It would be great if the user can hover over the data-points on the plot and get the relevant information (hint - 
    visit https://plotly.com/r/, https://plotly.com/python)

  iv. Here is a quick reference for you. The scatter plot is between chronos score for TTBK2 gene and expression for MORC2 gene with coloring defined by
    Gender/Sex column from the metadata file.

b. Boxplot/violin plot

  i. User should be able to select a gene and a variable (expression / chronos / copy_number) and generate a boxplot to display its distribution across 
   multiple categories as defined by user selected variable (a column from the metadata file)

 ii. Here is an example for your reference where violin plot for CHRONOS score for gene CCL22 is plotted and grouped by ‘Lineage’

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Customer Sale Dataset for Data Visualization
kaggle.com
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atul (2025). Customer Sale Dataset for Data Visualization [Dataset]. https://www.kaggle.com/datasets/atulkgoyl/customer-sale-dataset-for-visualization
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atul
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.

Unlike most public datasets, this one includes a diverse mix of column types:

📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)

Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.

Feel free to:

Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!

Hope you find this helpful. Looking forward to hearing from you all.
Titanic: A Voyage into the Past
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asher Mehfooz (2023). Titanic: A Voyage into the Past [Dataset]. https://www.kaggle.com/datasets/ashirzaki/titanic
Explore at:
zip(22564 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
Asher Mehfooz
Description
**Dataset Overview ** The Titanic dataset is a widely used benchmark dataset for machine learning and data science tasks. It contains information about passengers who boarded the RMS Titanic in 1912, including their age, sex, social class, and whether they survived the sinking of the ship. The dataset is divided into two main parts:

Train.csv: This file contains information about 891 passengers who were used to train machine learning models. It includes the following features:

PassengerId: A unique identifier for each passenger Survived: Whether the passenger survived (1) or not (0) Pclass: The passenger's social class (1 = Upper, 2 = Middle, 3 = Lower) Name: The passenger's name Sex: The passenger's sex (Male or Female) Age: The passenger's age Sibsp: The number of siblings or spouses aboard the ship Parch: The number of parents or children aboard the ship Ticket: The passenger's ticket number Fare: The passenger's fare Cabin: The passenger's cabin number Embarked: The port where the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton) Test.csv: This file contains information about 418 passengers who were not used to train machine learning models. It includes the same features as train.csv, but does not include the Survived label. The goal of machine learning models is to predict whether or not each passenger in the test.csv file survived.

**Data Preparation ** Before using the Titanic dataset for machine learning tasks, it is important to perform some data preparation steps. These steps may include:

Handling missing values: Some of the features in the dataset have missing values. These values can be imputed or removed, depending on the specific task. Encoding categorical variables: Some of the features in the dataset are categorical variables, such as Pclass, Sex, and Embarked. These variables need to be encoded numerically before they can be used by machine learning algorithms. Scaling numerical variables: Some of the features in the dataset are numerical variables, such as Age and Fare. These variables may need to be scaled to ensure that they are on the same scale. Data Visualization

Data visualization can be a useful tool for exploring the Titanic dataset and gaining insights into the data. Some common data visualization techniques that can be used with the Titanic dataset include:

Histograms: Histograms can be used to visualize the distribution of numerical variables, such as Age and Fare. Scatter plots: Scatter plots can be used to visualize the relationship between two numerical variables. Box plots: Box plots can be used to visualize the distribution of a numerical variable across different categories, such as Pclass and Sex. Machine Learning Tasks

The Titanic dataset can be used for a variety of machine learning tasks, including:

Classification: The most common task is to use the train.csv file to train a machine learning model to predict whether or not each passenger in the test.csv file survived. Regression: The dataset can also be used to train a machine learning model to predict the fare of a passenger based on their other features. Anomaly detection: The dataset can also be used to identify anomalies, such as passengers who are outliers in terms of their age, social class, or other features.
Data Preprocessing EDA Microarray GE Data GSE5583
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). Data Preprocessing EDA Microarray GE Data GSE5583 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/data-preprocessing-eda-microarray-ge-data-gse5583
Explore at:
zip(3144708 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is based on GEO series GSE5583. OmicsDI

The experiment compares gene expression profiles between wild‑type mouse embryonic stem cells (ES cells) and ES cells in which Histone deacetylase 1 (HDAC1) has been knocked out. OmicsDI

The organism used is mouse (Mus musculus). OmicsDI

Microarray technology was employed to measure transcript abundance across the genome, aiming to identify putative HDAC1 target genes. OmicsDI +1

The dataset includes processed expression data (after normalization and log2 transformation), allowing for downstream exploratory data analysis (EDA) and differential gene expression (DGE) analysis.

As part of EDA, sample‑wise distribution plots (e.g. boxplots) are provided to assess normalization across all arrays.

The dataset also includes downstream visualizations and analysis results, such as boxplots, which help in evaluating the consistency and quality of the processed data.

Researchers can use this dataset to perform differential expression analysis between HDAC1 knockout vs wild‑type ES cells, investigate epigenetic regulation, or explore downstream effects of histone deacetylation loss.

Additionally, the dataset can serve as a reference example for microarray data preprocessing, normalization, transformation (e.g. log2), and exploratory visualization workflows.

The dataset is publicly available and sourced from a trusted repository (GEO), ensuring transparency and reproducibility of the experiment.
Automated_Descriptive_Statistics_Pipeline R Studio
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). Automated_Descriptive_Statistics_Pipeline R Studio [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/automated-descriptive-statistics-pipeline-r-studio
Explore at:
zip(21548 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
• Automated parametric analysis workflow built using R Studio.
• Demonstrates core statistical analysis methods on numerical datasets.
• Includes step-by-step R scripts for performing t-tests, ANOVA, and summary statistics.
• Provides visual outputs such as boxplots and distribution plots for better interpretation.
• Designed for students, researchers, and data analysts learning statistical automation in R.
• Useful for understanding reproducible research workflows in data analysis.
• Dataset helps in teaching how to automate statistical pipelines using R programming.
Walmart Data Set
kaggle.com
zip
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Garrett Carter (2023). Walmart Data Set [Dataset]. https://www.kaggle.com/datasets/matthewgarrettcarter/walmart-data-set
Explore at:
zip(272320 bytes)Available download formats
Dataset updated
Jan 4, 2023
Authors
Matthew Garrett Carter
Description
Introduction

The purpose of this project was added practice in learning new and demonstrate R Data analytical skills. The data set was located in Kaggle and shows sales information from the years 2010 to 2012. The weekly sales have two categories: holiday and non holiday representing 1 and 0 in that column respectfully.

The main question for this exercise was were there any factors that affected weekly sales for the stores? Those factors included temperature, fuel prices, and unemployment rates.

The following packages required for this project:

install.packages("tidyverse") install.packages("dplyr") install.packages("tsibble")

The following libraries required:

library("tidyverse") library(readr) library(dplyr) library(ggplot2) library(readr) library(lubridate) library(tsibble)

Downloading data set into RStudio:

Walmart <- read.csv("C:/Users/matth/OneDrive/Desktop/Case Study/Walmart.csv")

Data Inspection

Compared column names of each file to verify consistency.

colnames(Walmart) colnames(Walmart) dim(Walmart) str(Walmart) head(Walmart) which(is.na(Walmart$Date)) sum(is.na(Walmart))

There is NA data in the set.

Turning Store and Holiday_flag into factors:

Walmart$Store<-as.factor(Walmart$Store) Walmart$Holiday_Flag<-as.factor(Walmart$Holiday_Flag)

Splicing the date into Year and weekyear:

Walmart$week<-yearweek(as.Date(Walmart$Date,tryFormats=c("%d-%m-%Y"))) # make sure to install "tsibble" Walmart$year<-format(as.Date(Walmart$Date,tryFormats=c("%d-%m-%Y")),"%Y")

Filered Holiday_Flag Column to include only holidays weeks:

Walmart_Holiday<- filter(Walmart, Holiday_Flag==1)

Filered Holiday_Flag Column to include only non holidays Weeks:

Walmart_Non_Holiday<- filter(Walmart, Holiday_Flag==0)

Lets review all 45 stores' weekly sales and compare them. Using dataset Walmart

ggplot(Walmart, aes(x=Weekly_Sales, y=Store))+geom_boxplot()+ labs(title = 'Weekly Sales Accross 45 Stores', x='Weekly sales', y='Store')+theme_bw()

Results

From observation of the boxplot, it shows that Store 14 had max sales while Store 33 had the min sales.

Lets verify the results via slice_max and slice_min:

Walmart %>% slice_max(Weekly_Sales) Walmart %>% slice_min(Weekly_Sales)

It looks the information was correct. Lets check the mean for the weekly_sales column:

mean(Walmart$Weekly_Sales)

The mean for Weekly_Sales column for the Walmart dataset was 1046965.

Lets check for the MIN and MAX of Weekly Sales but only if they are holiday sales weeks:

ggplot(Walmart_Holiday, aes(x=Weekly_Sales, y=Store))+geom_boxplot()+ labs(title = 'Holiday Sales Accross 45 Stores', x='Weekly sales', y='Store')+theme_bw()

Result

Store 4 had the highest weekly sales during a holiday week based on the boxplot. Boxplot shows stores 33 and 5 as some of the lowest holiday sales.Lets reverify with slice_max and slice_min:

Walmart_Holiday %>% slice_max(Weekly_Sales) Walmart_Holiday %>% slice_min(Weekly_Sales)

The results match what is given on the boxplot. Lets find the mean:

mean(Walmart_Holiday$Weekly_Sales)

The result was that the mean was 1122888.

Lets check for the MIN and MAX of Weekly Sales but only if they are non holiday sales weeks:

ggplot(Walmart_Non_Holiday, aes(x=Weekly_Sales, y=Store))+geom_boxplot()+ labs(title = 'Non Holiday Sales Accross 45 Stores', x='Weekly sales', y='Store')+theme_bw()

Lets matched the results of the Walmart dataset that had both non holiday weeks and holiday weeks. Store 14 had the max sales and store 33 had the minimum sales. Lets verify the results and find the mean:

Walmart_Non_Holiday %>% slice_max(Weekly_Sales) Walmart_Non_Holiday %>% slice_min(Weekly_Sales) mean(Walmart_Non_Holiday$Weekly_Sales)

Results matched. And the mean for weekly sales was 1041256.

Which Year had the most sales?

ggplot(data = Walmart) + geom_point(mapping = aes(x=year, y=Weekly_Sales))

According the plot, 2010 had the most sales. Lets use a boxplot to see more.

ggplot(Walmart, aes(x=year, y=Weekly_Sales))+geom_boxplot()+ labs(title = 'Weekly Sales for Years 2010 - 2012', x='Year', y='Weekly Sales')

2010 Saw higher sales numbers and higher medium

Is there any differance between Sales during no Holiday weeks and Holiday weeks?

Lets start with holiday weekly sales:

ggplot(Walmart_Holiday, aes(x=year, y=Weekly_Sales))+geom_boxplot()+ labs(title = 'Holiday Weekly Sales for Years ...
Dermatology Dataset (Multi-class classification)
kaggle.com
zip
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
olcay_bolat (2023). Dermatology Dataset (Multi-class classification) [Dataset]. https://www.kaggle.com/olcaybolat1/dermatology-dataset-classification
Explore at:
zip(5257 bytes)Available download formats
Dataset updated
May 9, 2023
Authors
olcay_bolat
Description
The differential diagnosis of "erythemato-squamous" diseases is a real problem in dermatology. They all share the clinical features of erythema and scaling, with minimal differences. The disorders in this group are psoriasis, seborrheic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis, and pityriasis rubra pilaris. Usually, a biopsy is necessary for the diagnosis, but unfortunately, these diseases share many histopathological features as well.

Patients were first evaluated clinically with 12 features. Afterward, skin samples were taken for the evaluation of 22 histopathological features. The values of the histopathological features are determined by an analysis of the samples under a microscope

Feature Value Information

In the dataset constructed for this domain, the family history feature has the value 1 if any of these diseases has been observed in the family, and 0 otherwise. The age feature simply represents the age of the patient.

Every other feature clinical and histopathological was given a degree in the range of 0 to 3. Here, 0 indicates that the feature was not present, 3 indicates the largest amount possible, and 1, 2 indicate the relative intermediate values.

Exploration Ideas

Distribution of each attribute: Explore the distribution of each attribute (column) in the dataset. You can use histograms or boxplots to visualize the distribution of each attribute and look for any patterns or outliers.

Correlation analysis: Use correlation matrices to explore the relationship between the different attributes in the dataset. This can help identify which attributes are most closely related to each other and may be useful in predicting the class labels.

Missing values analysis: Investigate the missing values in the Age attribute, which are represented with '?' in the dataset. Determine the proportion of missing values and evaluate whether imputation is needed.

Class distribution: Explore the distribution of the class labels in the dataset. You can use bar plots to visualize the number of instances for each class, and determine whether the dataset is balanced or imbalanced.

Feature engineering: Consider creating new features that may be useful in predicting the class labels. For example, you could create a feature that combines the presence of specific clinical attributes or histopathological attributes.

Outlier detection: Explore the presence of any outliers in the dataset. Outliers can skew the distribution of the data and impact the performance of machine learning models. You can use boxplots or scatterplots to visualize the distribution of each attribute and identify any potential outliers.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mustafa Almitamy (2024). box-plot-data [Dataset]. https://www.kaggle.com/datasets/mustafaalmitamy/box-plot-data

box-plot-data

Explore at:

zip(7450 bytes)Available download formats

Dataset updated

Mar 14, 2024

Authors

Mustafa Almitamy

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by Mustafa Almitamy

Released under Apache 2.0

Clear search

Close search

Google apps

Main menu

box-plot-data

Dataset

Contents

Datasets-Box-Plot

Dataset

Contents

Box plot outlier

Dataset

Contents

Box Plot Outliers

Dataset

Contents

Test-box-plots

Dataset

Contents

box plot

Dataset

Contents

Box plot

Dataset

Contents

Boxplot

Dataset

Contents

boxplot

Dataset

Contents

akash box plot

Dataset

Contents

non-itp-hb-boxplot

Dataset

Contents

BoxPlot_All_LM

Dataset

Contents

Gráfico_Boxplot

Dataset

Contents

Plotly Dashboard Healthcare

Context

Content

Acknowledgements

Inspiration

Customer Sale Dataset for Data Visualization

Titanic: A Voyage into the Past

Data Preprocessing EDA Microarray GE Data GSE5583

Automated_Descriptive_Statistics_Pipeline R Studio

Walmart Data Set

Introduction

The following packages required for this project:

The following libraries required:

Downloading data set into RStudio:

Data Inspection

Turning Store and Holiday_flag into factors:

Splicing the date into Year and weekyear:

Filered Holiday_Flag Column to include only holidays weeks:

Filered Holiday_Flag Column to include only non holidays Weeks:

Lets review all 45 stores' weekly sales and compare them. Using dataset Walmart

Results

Lets check for the MIN and MAX of Weekly Sales but only if they are holiday sales weeks:

Result

Lets check for the MIN and MAX of Weekly Sales but only if they are non holiday sales weeks:

Which Year had the most sales?

Is there any differance between Sales during no Holiday weeks and Holiday weeks?

Dermatology Dataset (Multi-class classification)

Feature Value Information

Exploration Ideas

box-plot-data

Dataset

Contents