20 datasets found
  1. box-plot-data

    • kaggle.com
    zip
    Updated Mar 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Almitamy (2024). box-plot-data [Dataset]. https://www.kaggle.com/datasets/mustafaalmitamy/box-plot-data
    Explore at:
    zip(7450 bytes)Available download formats
    Dataset updated
    Mar 14, 2024
    Authors
    Mustafa Almitamy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Mustafa Almitamy

    Released under Apache 2.0

    Contents

  2. Datasets-Box-Plot

    • kaggle.com
    zip
    Updated Mar 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Almitamy (2024). Datasets-Box-Plot [Dataset]. https://www.kaggle.com/mustafaalmitamy/datasets-box-plot
    Explore at:
    zip(220 bytes)Available download formats
    Dataset updated
    Mar 14, 2024
    Authors
    Mustafa Almitamy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Mustafa Almitamy

    Released under Apache 2.0

    Contents

  3. Box plot outlier

    • kaggle.com
    zip
    Updated Jul 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chiragksharma (2022). Box plot outlier [Dataset]. https://www.kaggle.com/datasets/chiragksharma/box-plot-outlier
    Explore at:
    zip(12695 bytes)Available download formats
    Dataset updated
    Jul 17, 2022
    Authors
    chiragksharma
    Description

    Dataset

    This dataset was created by chiragksharma

    Contents

  4. Box Plot Outliers

    • kaggle.com
    zip
    Updated Jul 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chiragksharma (2022). Box Plot Outliers [Dataset]. https://www.kaggle.com/datasets/chiragksharma/box-plot-outliers
    Explore at:
    zip(12695 bytes)Available download formats
    Dataset updated
    Jul 17, 2022
    Authors
    chiragksharma
    Description

    Dataset

    This dataset was created by chiragksharma

    Contents

  5. Test-box-plots

    • kaggle.com
    zip
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    srinidhi yerabati (2023). Test-box-plots [Dataset]. https://www.kaggle.com/datasets/srinidhiyerabati/test-box-plots
    Explore at:
    zip(292 bytes)Available download formats
    Dataset updated
    Jun 30, 2023
    Authors
    srinidhi yerabati
    Description

    Dataset

    This dataset was created by srinidhi yerabati

    Contents

  6. box plot

    • kaggle.com
    zip
    Updated Oct 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bro Brother Crony420 (2021). box plot [Dataset]. https://www.kaggle.com/brobrothercrony420/box-plot
    Explore at:
    zip(577 bytes)Available download formats
    Dataset updated
    Oct 28, 2021
    Authors
    Bro Brother Crony420
    Description

    Dataset

    This dataset was created by Bro Brother Crony420

    Contents

  7. Box plot

    • kaggle.com
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumbal Wahid (2024). Box plot [Dataset]. https://www.kaggle.com/datasets/sumbalwahid/box-plot/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sumbal Wahid
    Description

    Dataset

    This dataset was created by Sumbal Wahid

    Released under Other (specified in description)

    Contents

  8. Boxplot

    • kaggle.com
    zip
    Updated Apr 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    İlyas Abbasov (2022). Boxplot [Dataset]. https://www.kaggle.com/datasets/lyasabbasov/boxplot
    Explore at:
    zip(1172721 bytes)Available download formats
    Dataset updated
    Apr 2, 2022
    Authors
    İlyas Abbasov
    Description

    Dataset

    This dataset was created by İlyas Abbasov

    Contents

  9. boxplot

    • kaggle.com
    zip
    Updated Apr 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C (2020). boxplot [Dataset]. https://www.kaggle.com/davidchen1998/boxplot
    Explore at:
    zip(50253 bytes)Available download formats
    Dataset updated
    Apr 27, 2020
    Authors
    David C
    Description

    Dataset

    This dataset was created by David C

    Contents

  10. akash box plot

    • kaggle.com
    zip
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Roy (2024). akash box plot [Dataset]. https://www.kaggle.com/datasets/akashcodee/akash-box-plot/suggestions
    Explore at:
    zip(15234 bytes)Available download formats
    Dataset updated
    Aug 27, 2024
    Authors
    Akash Roy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Akash Roy

    Released under Apache 2.0

    Contents

  11. non-itp-hb-boxplot

    • kaggle.com
    zip
    Updated Jun 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARITRA BRAHMA (2021). non-itp-hb-boxplot [Dataset]. https://www.kaggle.com/aritrabrahma/nonitphbboxplot
    Explore at:
    zip(536 bytes)Available download formats
    Dataset updated
    Jun 3, 2021
    Authors
    ARITRA BRAHMA
    Description

    Dataset

    This dataset was created by ARITRA BRAHMA

    Contents

  12. BoxPlot_All_LM

    • kaggle.com
    zip
    Updated Dec 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hadeer Khaled Nabil (2022). BoxPlot_All_LM [Dataset]. https://www.kaggle.com/datasets/hadeerkhalednabil/boxplot-all-lm/code
    Explore at:
    zip(12072 bytes)Available download formats
    Dataset updated
    Dec 26, 2022
    Authors
    Hadeer Khaled Nabil
    Description

    Dataset

    This dataset was created by Hadeer Khaled Nabil

    Contents

  13. Gráfico_Boxplot

    • kaggle.com
    zip
    Updated Dec 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Diego (2022). Gráfico_Boxplot [Dataset]. https://www.kaggle.com/alandiego/grfico-boxplot
    Explore at:
    zip(186690 bytes)Available download formats
    Dataset updated
    Dec 8, 2022
    Authors
    Alan Diego
    Description

    Dataset

    This dataset was created by Alan Diego

    Contents

  14. Plotly Dashboard Healthcare

    • kaggle.com
    zip
    Updated Jan 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A SURESH (2022). Plotly Dashboard Healthcare [Dataset]. https://www.kaggle.com/datasets/sureshmecad/plotly-dashboard-healthcare
    Explore at:
    zip(1741234 bytes)Available download formats
    Dataset updated
    Jan 4, 2022
    Authors
    A SURESH
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data Visualization

    Content

    a. Scatter plot

      i. The webapp should allow the user to select genes from datasets and plot 2D scatter plots between 2 variables(expression/copy_number/chronos) for 
        any pair of genes.
    
      ii. The user should be able to filter and color data points using metadata information available in the file “metadata.csv”.
    
      iii. The visualization could be interactive - It would be great if the user can hover over the data-points on the plot and get the relevant information (hint - 
        visit https://plotly.com/r/, https://plotly.com/python)
    
      iv. Here is a quick reference for you. The scatter plot is between chronos score for TTBK2 gene and expression for MORC2 gene with coloring defined by
        Gender/Sex column from the metadata file.
    

    b. Boxplot/violin plot

      i. User should be able to select a gene and a variable (expression / chronos / copy_number) and generate a boxplot to display its distribution across 
       multiple categories as defined by user selected variable (a column from the metadata file)
    
     ii. Here is an example for your reference where violin plot for CHRONOS score for gene CCL22 is plotted and grouped by ‘Lineage’
    

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  15. Customer Sale Dataset for Data Visualization

    • kaggle.com
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atul (2025). Customer Sale Dataset for Data Visualization [Dataset]. https://www.kaggle.com/datasets/atulkgoyl/customer-sale-dataset-for-visualization
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atul
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.

    Unlike most public datasets, this one includes a diverse mix of column types:

    📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)

    Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.

    Feel free to:

    Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!

    Hope you find this helpful. Looking forward to hearing from you all.

  16. Titanic: A Voyage into the Past

    • kaggle.com
    zip
    Updated Nov 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asher Mehfooz (2023). Titanic: A Voyage into the Past [Dataset]. https://www.kaggle.com/datasets/ashirzaki/titanic
    Explore at:
    zip(22564 bytes)Available download formats
    Dataset updated
    Nov 7, 2023
    Authors
    Asher Mehfooz
    Description

    **Dataset Overview ** The Titanic dataset is a widely used benchmark dataset for machine learning and data science tasks. It contains information about passengers who boarded the RMS Titanic in 1912, including their age, sex, social class, and whether they survived the sinking of the ship. The dataset is divided into two main parts:

    Train.csv: This file contains information about 891 passengers who were used to train machine learning models. It includes the following features:

    PassengerId: A unique identifier for each passenger Survived: Whether the passenger survived (1) or not (0) Pclass: The passenger's social class (1 = Upper, 2 = Middle, 3 = Lower) Name: The passenger's name Sex: The passenger's sex (Male or Female) Age: The passenger's age Sibsp: The number of siblings or spouses aboard the ship Parch: The number of parents or children aboard the ship Ticket: The passenger's ticket number Fare: The passenger's fare Cabin: The passenger's cabin number Embarked: The port where the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton) Test.csv: This file contains information about 418 passengers who were not used to train machine learning models. It includes the same features as train.csv, but does not include the Survived label. The goal of machine learning models is to predict whether or not each passenger in the test.csv file survived.

    **Data Preparation ** Before using the Titanic dataset for machine learning tasks, it is important to perform some data preparation steps. These steps may include:

    Handling missing values: Some of the features in the dataset have missing values. These values can be imputed or removed, depending on the specific task. Encoding categorical variables: Some of the features in the dataset are categorical variables, such as Pclass, Sex, and Embarked. These variables need to be encoded numerically before they can be used by machine learning algorithms. Scaling numerical variables: Some of the features in the dataset are numerical variables, such as Age and Fare. These variables may need to be scaled to ensure that they are on the same scale. Data Visualization

    Data visualization can be a useful tool for exploring the Titanic dataset and gaining insights into the data. Some common data visualization techniques that can be used with the Titanic dataset include:

    Histograms: Histograms can be used to visualize the distribution of numerical variables, such as Age and Fare. Scatter plots: Scatter plots can be used to visualize the relationship between two numerical variables. Box plots: Box plots can be used to visualize the distribution of a numerical variable across different categories, such as Pclass and Sex. Machine Learning Tasks

    The Titanic dataset can be used for a variety of machine learning tasks, including:

    Classification: The most common task is to use the train.csv file to train a machine learning model to predict whether or not each passenger in the test.csv file survived. Regression: The dataset can also be used to train a machine learning model to predict the fare of a passenger based on their other features. Anomaly detection: The dataset can also be used to identify anomalies, such as passengers who are outliers in terms of their age, social class, or other features.

  17. Data Preprocessing EDA Microarray GE Data GSE5583

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Data Preprocessing EDA Microarray GE Data GSE5583 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/data-preprocessing-eda-microarray-ge-data-gse5583
    Explore at:
    zip(3144708 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is based on GEO series GSE5583. OmicsDI

    The experiment compares gene expression profiles between wild‑type mouse embryonic stem cells (ES cells) and ES cells in which Histone deacetylase 1 (HDAC1) has been knocked out. OmicsDI

    The organism used is mouse (Mus musculus). OmicsDI

    Microarray technology was employed to measure transcript abundance across the genome, aiming to identify putative HDAC1 target genes. OmicsDI +1

    The dataset includes processed expression data (after normalization and log2 transformation), allowing for downstream exploratory data analysis (EDA) and differential gene expression (DGE) analysis.

    As part of EDA, sample‑wise distribution plots (e.g. boxplots) are provided to assess normalization across all arrays.

    The dataset also includes downstream visualizations and analysis results, such as boxplots, which help in evaluating the consistency and quality of the processed data.

    Researchers can use this dataset to perform differential expression analysis between HDAC1 knockout vs wild‑type ES cells, investigate epigenetic regulation, or explore downstream effects of histone deacetylation loss.

    Additionally, the dataset can serve as a reference example for microarray data preprocessing, normalization, transformation (e.g. log2), and exploratory visualization workflows.

    The dataset is publicly available and sourced from a trusted repository (GEO), ensuring transparency and reproducibility of the experiment.

  18. Automated_Descriptive_Statistics_Pipeline R Studio

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Automated_Descriptive_Statistics_Pipeline R Studio [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/automated-descriptive-statistics-pipeline-r-studio
    Explore at:
    zip(21548 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    • Automated parametric analysis workflow built using R Studio.
    • Demonstrates core statistical analysis methods on numerical datasets.
    • Includes step-by-step R scripts for performing t-tests, ANOVA, and summary statistics.
    • Provides visual outputs such as boxplots and distribution plots for better interpretation.
    • Designed for students, researchers, and data analysts learning statistical automation in R.
    • Useful for understanding reproducible research workflows in data analysis.
    • Dataset helps in teaching how to automate statistical pipelines using R programming.

  19. Walmart Data Set

    • kaggle.com
    zip
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Garrett Carter (2023). Walmart Data Set [Dataset]. https://www.kaggle.com/datasets/matthewgarrettcarter/walmart-data-set
    Explore at:
    zip(272320 bytes)Available download formats
    Dataset updated
    Jan 4, 2023
    Authors
    Matthew Garrett Carter
    Description

    Introduction

    The purpose of this project was added practice in learning new and demonstrate R Data analytical skills. The data set was located in Kaggle and shows sales information from the years 2010 to 2012. The weekly sales have two categories: holiday and non holiday representing 1 and 0 in that column respectfully.

    The main question for this exercise was were there any factors that affected weekly sales for the stores? Those factors included temperature, fuel prices, and unemployment rates.

    The following packages required for this project:

    install.packages("tidyverse")
    install.packages("dplyr")
    install.packages("tsibble")
    

    The following libraries required:

    library("tidyverse")
    library(readr)
    library(dplyr)
    library(ggplot2)
    library(readr)
    library(lubridate)
    library(tsibble)
    

    Downloading data set into RStudio:

    Walmart <- read.csv("C:/Users/matth/OneDrive/Desktop/Case Study/Walmart.csv")
    

    Data Inspection

    Compared column names of each file to verify consistency.

    
    colnames(Walmart)
    colnames(Walmart)
    dim(Walmart)
    str(Walmart)
    head(Walmart)
    which(is.na(Walmart$Date))
    sum(is.na(Walmart))
    

    There is NA data in the set.

    Turning Store and Holiday_flag into factors:

    Walmart$Store<-as.factor(Walmart$Store)
    Walmart$Holiday_Flag<-as.factor(Walmart$Holiday_Flag)
    

    Splicing the date into Year and weekyear:

    Walmart$week<-yearweek(as.Date(Walmart$Date,tryFormats=c("%d-%m-%Y"))) # make sure to install "tsibble"
    Walmart$year<-format(as.Date(Walmart$Date,tryFormats=c("%d-%m-%Y")),"%Y")
    
    

    Filered Holiday_Flag Column to include only holidays weeks:

    Walmart_Holiday<-
     filter(Walmart, Holiday_Flag==1)
    

    Filered Holiday_Flag Column to include only non holidays Weeks:

    Walmart_Non_Holiday<-
     filter(Walmart, Holiday_Flag==0)
    

    Lets review all 45 stores' weekly sales and compare them. Using dataset Walmart

    ggplot(Walmart, aes(x=Weekly_Sales, y=Store))+geom_boxplot()+ labs(title = 'Weekly Sales Accross 45 Stores', 
                                      x='Weekly sales', y='Store')+theme_bw()
    

    Results

    From observation of the boxplot, it shows that Store 14 had max sales while Store 33 had the min sales.

    Lets verify the results via slice_max and slice_min:

    Walmart %>% slice_max(Weekly_Sales)
    
    Walmart %>% slice_min(Weekly_Sales) 
    

    It looks the information was correct. Lets check the mean for the weekly_sales column:

    mean(Walmart$Weekly_Sales)
    

    The mean for Weekly_Sales column for the Walmart dataset was 1046965.

    Lets check for the MIN and MAX of Weekly Sales but only if they are holiday sales weeks:

    ggplot(Walmart_Holiday, aes(x=Weekly_Sales, y=Store))+geom_boxplot()+ labs(title = 'Holiday Sales Accross 45 Stores', 
                                      x='Weekly sales', y='Store')+theme_bw()
    

    Result

    Store 4 had the highest weekly sales during a holiday week based on the boxplot. Boxplot shows stores 33 and 5 as some of the lowest holiday sales.Lets reverify with slice_max and slice_min:

    Walmart_Holiday %>% slice_max(Weekly_Sales)
    
    Walmart_Holiday %>% slice_min(Weekly_Sales)
    

    The results match what is given on the boxplot. Lets find the mean:

    mean(Walmart_Holiday$Weekly_Sales)
    

    The result was that the mean was 1122888.

    Lets check for the MIN and MAX of Weekly Sales but only if they are non holiday sales weeks:

    ggplot(Walmart_Non_Holiday, aes(x=Weekly_Sales, y=Store))+geom_boxplot()+ labs(title = 'Non Holiday Sales Accross 45 Stores', x='Weekly sales', y='Store')+theme_bw()
    

    Lets matched the results of the Walmart dataset that had both non holiday weeks and holiday weeks. Store 14 had the max sales and store 33 had the minimum sales. Lets verify the results and find the mean:

    Walmart_Non_Holiday %>% slice_max(Weekly_Sales)
    
    Walmart_Non_Holiday %>% slice_min(Weekly_Sales)  
    
    mean(Walmart_Non_Holiday$Weekly_Sales)
    

    Results matched. And the mean for weekly sales was 1041256.

    Which Year had the most sales?

    ggplot(data = Walmart) + geom_point(mapping = aes(x=year, y=Weekly_Sales))
    

    According the plot, 2010 had the most sales. Lets use a boxplot to see more.

    ggplot(Walmart, aes(x=year, y=Weekly_Sales))+geom_boxplot()+ labs(title = 'Weekly Sales for Years 2010 - 2012', 
                                         x='Year', y='Weekly Sales')
    

    2010 Saw higher sales numbers and higher medium

    Is there any differance between Sales during no Holiday weeks and Holiday weeks?

    Lets start with holiday weekly sales:

    ggplot(Walmart_Holiday, aes(x=year, y=Weekly_Sales))+geom_boxplot()+ labs(title = 'Holiday Weekly Sales for Years ...
    
  20. Dermatology Dataset (Multi-class classification)

    • kaggle.com
    zip
    Updated May 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    olcay_bolat (2023). Dermatology Dataset (Multi-class classification) [Dataset]. https://www.kaggle.com/olcaybolat1/dermatology-dataset-classification
    Explore at:
    zip(5257 bytes)Available download formats
    Dataset updated
    May 9, 2023
    Authors
    olcay_bolat
    Description
    • The differential diagnosis of "erythemato-squamous" diseases is a real problem in dermatology. They all share the clinical features of erythema and scaling, with minimal differences. The disorders in this group are psoriasis, seborrheic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis, and pityriasis rubra pilaris. Usually, a biopsy is necessary for the diagnosis, but unfortunately, these diseases share many histopathological features as well.

    • Patients were first evaluated clinically with 12 features. Afterward, skin samples were taken for the evaluation of 22 histopathological features. The values of the histopathological features are determined by an analysis of the samples under a microscope

    Feature Value Information

    In the dataset constructed for this domain, the family history feature has the value 1 if any of these diseases has been observed in the family, and 0 otherwise. The age feature simply represents the age of the patient.

    Every other feature clinical and histopathological was given a degree in the range of 0 to 3. Here, 0 indicates that the feature was not present, 3 indicates the largest amount possible, and 1, 2 indicate the relative intermediate values.

    Exploration Ideas

    • Distribution of each attribute: Explore the distribution of each attribute (column) in the dataset. You can use histograms or boxplots to visualize the distribution of each attribute and look for any patterns or outliers.

    • Correlation analysis: Use correlation matrices to explore the relationship between the different attributes in the dataset. This can help identify which attributes are most closely related to each other and may be useful in predicting the class labels.

    • Missing values analysis: Investigate the missing values in the Age attribute, which are represented with '?' in the dataset. Determine the proportion of missing values and evaluate whether imputation is needed.

    • Class distribution: Explore the distribution of the class labels in the dataset. You can use bar plots to visualize the number of instances for each class, and determine whether the dataset is balanced or imbalanced.

    • Feature engineering: Consider creating new features that may be useful in predicting the class labels. For example, you could create a feature that combines the presence of specific clinical attributes or histopathological attributes.

    • Outlier detection: Explore the presence of any outliers in the dataset. Outliers can skew the distribution of the data and impact the performance of machine learning models. You can use boxplots or scatterplots to visualize the distribution of each attribute and identify any potential outliers.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mustafa Almitamy (2024). box-plot-data [Dataset]. https://www.kaggle.com/datasets/mustafaalmitamy/box-plot-data
Organization logo

box-plot-data

Explore at:
zip(7450 bytes)Available download formats
Dataset updated
Mar 14, 2024
Authors
Mustafa Almitamy
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by Mustafa Almitamy

Released under Apache 2.0

Contents

Search
Clear search
Close search
Google apps
Main menu