Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data visualization is important for statistical analysis, as it helps convey information efficiently and shed lights on the hidden patterns behind data in a visual context. It is particularly helpful to display circular data in a two-dimensional space to accommodate its nonlinear support space and reveal the underlying circular structure which is otherwise not obvious in one-dimension. In this article, we first formally categorize circular plots into two types, either height- or area-proportional, and then describe a new general methodology that can be used to produce circular plots, particularly in the area-proportional manner, which in our opinion is the more appropriate choice. Formulas are given that are fairly simple yet effective to produce various circular plots, such as smooth density curves, histograms, rose diagrams, dot plots, and plots for multiclass data. Supplemental materials for this article are available online.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Abstract: The dataset contains the data underlying the plots and histograms in the article "Collective photon emission patterns from two atoms in free space". The data show the photon statistics of a trapped two-ion crystal observed in the far-field. Details of the measurement process and experimental setup can be found in the journal publication. The data for each individual plot are presented as a table in a separate sheet in an xlsx-spreadsheet.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
LLM Distribution Evaluation Dataset
This dataset contains 50,000 synthetic graphs with questions and answers about statistical distributions, designed to evaluate large language models' ability to analyze data visualizations.
Dataset Description
Dataset Summary
This dataset contains diverse statistical visualizations (bar charts, line plots, scatter plots, histograms, area charts, and step plots) with associated questions about:
Normality testing Distribution… See the full description on the dataset page: https://huggingface.co/datasets/robvanvolt/llm-distribution.
Facebook
TwitterThe Excel spreadsheet (with the comma-separated-value: CSV files using the same names) contains the three tables and the raw data used to plot histograms of Fig. 8, Fig. 11(a), and Fig. 11(b). Here, the range is from 0 to 100 ÎĽm. Each Tab corresponds to Figure number in the paper. The Feret diameter is obtained by using image analysis software (Fiji 2.3.0: Schindelin et al., 2012). The histogram bin width is calculated using the formula (Scott, 1979).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This synthetic dataset is designed specifically for practicing data visualization and exploratory data analysis (EDA) using popular Python libraries like Seaborn, Matplotlib, and Pandas.
Unlike most public datasets, this one includes a diverse mix of column types:
📅 Date columns (for time series and trend plots) 🔢 Numerical columns (for histograms, boxplots, scatter plots) 🏷️ Categorical columns (for bar charts, group analysis)
Whether you are a beginner learning how to visualize data or an intermediate user testing new charting techniques, this dataset offers a versatile playground.
Feel free to:
Create EDA notebooks Practice plotting techniques Experiment with filtering, grouping, and aggregations 🛠️ No missing values, no data cleaning needed — just download and start exploring!
Hope you find this helpful. Looking forward to hearing from you all.
Facebook
TwitterHistogram plot of the average alignment accuracy averaged over 10 runs for each viral genome shown in Table 1 and each aligner. Reads crossing splice junction regions are shown in pink, reads not crossing splice junction regions are shown in blue).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figure S1. Illustration of indicator mineral map datasets. Figure S2. Illustration of fault map datasets. Figure S3. Fault system at CGF. Figure S4. Fault system at BGF and DPGF. Figure S5. Illustration of LST datasets. Figure S6. Histograms and CDF plots of Two-Class Mineral Maps versus Fault Distance Maps. Figure S7. Histograms and CDF plots of Two-Class Mineral Maps versus Fault Density Maps. Figure S8. Histograms and CDF plots of Two-class Temperature Maps versus fault datasets. The top two rows correspond to Fault Distance Maps, while the bottom two rows correspond to Fault Density Maps. Figure S9. Histograms and CDF plots of Two-Class Mineral Maps versus Multiclass Temperature Map. Figure S10. The multiple comparisons of ANOVA. The plots show the mean estimates (circles) and 95% confidence intervals (bars) for each group of SGP. Red symbols highlight groups with Significant Differences from the control group (blue). Grey symbols indicate groups with Insignificant Differences where confidence intervals overlap with the control group.
Facebook
TwitterExcel spreadsheet containing, on separate sheets, the underlying numerical data that were used to generate plots or histograms for Figs 1A–1D, 1E—left panel, 1E—right panel, 2A-2F, 3B, 3C, 5A, 5C, 6C, 6D, 7B, 7C, S1, S2A, S2B, S13A–S13D.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file contains the supplementary material for the publication
Protein secondary-structure description with a coarse-grained model by Gerald R. Kneller and K. Hinsen http://dx.doi.org/10.1107/S1399004715007191 Acta Cryst. (2015). D71, 1411-1422
Datasets in this file
1) ScrewFit and ScrewFrame parameters for ideal secondary-structure elements
Scripts: /code/import_ideal_structures /code/analyze_ideal_structures
1.1) The PDB files generated with Chimera
/data/ideal_structures/3-10.pdb /data/ideal_structures/alpha.pdb /data/ideal_structures/beta-antiparallel.pdb /data/ideal_structures/beta-parallel.pdb /data/ideal_structures/pi.pdb
1.2) The corresponding MOSAIC datasets
/data/ideal_structures/3-10 /data/ideal_structures/alpha /data/ideal_structures/beta-antiparallel /data/ideal_structures/beta-parallel /data/ideal_structures/pi
1.3) The ScrewFit parameters
/data/ideal_structures/screwfit/3-10 /data/ideal_structures/screwfit/alpha /data/ideal_structures/screwfit/beta-antiparallel /data/ideal_structures/screwfit/beta-parallel /data/ideal_structures/screwfit/pi
1.4) The ScrewFrame parameters
/data/ideal_structures/screwframe/3-10 /data/ideal_structures/screwframe/alpha /data/ideal_structures/screwframe/beta-antiparallel /data/ideal_structures/screwframe/beta-parallel /data/ideal_structures/screwframe/pi
2) Statistics for ScrewFit and ScrewFrame parameters computed for the ASTRAL SCOPe subset with less than 40% sequence identity.
Scripts: /code/astral_analysis /code/fit_rho_distributions /code/plot_histograms
2.1) The ASTRAL database (link to published ActivePaper)
/data/astral_2.04
2.2) The histograms for the ScrewFit and ScrewFrame parameters for the all-alpha and all-beta subsets
/data/histograms/astral_alpha/screwfit /data/histograms/astral_alpha/screwframe
/data/histograms/astral_beta/screwfit /data/histograms/astral_beta/screwframe
2.3) The Gaussians fitted to the peaks in the distributions for rho
/data/fitted_rho_distributions/screwfit /data/fitted_rho_distributions/screwframe
2.4) Plots
/documentation/delta.pdf /documentation/delta_q.pdf /documentation/delta_r.pdf /documentation/p.pdf /documentation/rho-detail.pdf /documentation/rho.pdf /documentation/sigma.pdf /documentation/tau.pdf
3) Comparison of secondary-structure identification between ScrewFrame and DSSP.
Script: /code/compare_secondary_structure_assignments /code/plot_histograms
3.1) The histograms of the lengths of secondary-structure elements
/data/histograms/secondary_structure/length-alpha-dssp /data/histograms/secondary_structure/length-alpha-screwframe /data/histograms/secondary_structure/length-beta-dssp /data/histograms/secondary_structure/length-beta-screwframe
3.2) The 2D histograms of the number of residues inside identified secondary-structure elements
/data/histograms/secondary_structure/n-alpha /data/histograms/secondary_structure/n-beta
3.3) The distribution of rho inside alpha helices
/data/histograms/secondary_structure/rho-alpha-dssp
3.3) Plots
/documentation/lengths-alpha.pdf /documentation/lengths-beta.pdf /documentation/n-alpha.pdf /documentation/n-beta.pdf /documentation/rho-alpha-dssp.pdf
4) Illustration for myoglobin and VADC-1
Scripts: /code/import_myoglobin_vdac /code/analyze_myoglobin /code/analyze_vdac /code/perturbation_analysis
4.1) Imported structures in MOSAIC format: PDB code 1A6G for myoglobin PDB code 2K4T for VDAC-1
/data/myoglobin /data/VDAC-1
4.2) Plots showing rho and delta
/documentation/rho-myoglobin.pdf /documentation/delta-myoglobin.pdf
4.3) Tube models for visualization with Chimera
/documentation/myoglobin-tube.bld /documentation/VDAC-1-tube.bld
4.4) Sensitivity to perturbations in the coordinates
/documentation/rho-perturbed-myoglobin.pdf /documentation/delta-perturbed-VDAC-1.pdf /documentation/rho-perturbed-myoglobin.pdf /documentation/delta-perturbed-VDAC-1.pdf /documentation/myoglobin-perturbation.pdf /documentation/VDAC-1-perturbation.pdf
5) Analysis of CA-only structures in the PDB
Scripts: /code/ca_analysis /code/import_calpha_structures /code/plot_histograms
5.1) Imported CA-only structures in MOSAIC format
/data/pdb_ca_only_structures
5.2) Histograms for ScrewFrame parameters
/data/histograms/ca_only_structures
5.3) Plots
/documentation/delta_ca.pdf /documentation/delta_q_ca.pdf /documentation/delta_r_ca.pdf /documentation/p_ca.pdf /documentation/rho_ca.pdf /documentation/sigma_ca.pdf /documentation/tau_ca.pdf
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Documentation: Cucumber Disease Detection
Introduction: A machine learning model for the automatic detection of diseases in cucumber plants is to be developed as part of the "Cucumber Disease Detection" project. This research is crucial because it tackles the issue of early disease identification in agriculture, which can increase crop yield and cut down on financial losses. To train and test the model, we use a dataset of pictures of cucumber plants.
Importance: Early disease diagnosis helps minimize crop losses, stop the spread of diseases, and better allocate resources in farming. Agriculture is a real-world application of this concept.
Goals and Objectives: Develop a machine learning model to classify cucumber plant images into healthy and diseased categories. Achieve a high level of accuracy in disease detection. Provide a tool for farmers to detect diseases early and take appropriate action.
Data Collection: Using cameras and smartphones, images from agricultural areas were gathered.
Data Preprocessing: Data cleaning to remove irrelevant or corrupted images. Handling missing values, if any, in the dataset. Removing outliers that may negatively impact model training. Data augmentation techniques applied to increase dataset diversity.
Exploratory Data Analysis (EDA) The dataset was examined using visuals like scatter plots and histograms. The data was examined for patterns, trends, and correlations. Understanding the distribution of photos of healthy and ill plants was made easier by EDA.
Methodology Machine Learning Algorithms:
Convolutional Neural Networks (CNNs) were chosen for image classification due to their effectiveness in handling image data. Transfer learning using pre-trained models such as ResNet or MobileNet may be considered. Train-Test Split:
The dataset was split into training and testing sets with a suitable ratio. Cross-validation may be used to assess model performance robustly.
Model Development The CNN model's architecture consists of layers, units, and activation operations. On the basis of experimentation, hyperparameters including learning rate, batch size, and optimizer were chosen. To avoid overfitting, regularization methods like dropout and L2 regularization were used.
Model Training During training, the model was fed the prepared dataset across a number of epochs. The loss function was minimized using an optimization method. To ensure convergence, early halting and model checkpoints were used.
Model Evaluation Evaluation Metrics:
Accuracy, precision, recall, F1-score, and confusion matrix were used to assess model performance. Results were computed for both training and test datasets. Performance Discussion:
The model's performance was analyzed in the context of disease detection in cucumber plants. Strengths and weaknesses of the model were identified.
Results and Discussion Key project findings include model performance and disease detection precision. a comparison of the many models employed, showing the benefits and drawbacks of each. challenges that were faced throughout the project and the methods used to solve them.
Conclusion recap of the project's key learnings. the project's importance to early disease detection in agriculture should be highlighted. Future enhancements and potential research directions are suggested.
References Library: Pillow,Roboflow,YELO,Sklearn,matplotlib Datasets:https://data.mendeley.com/datasets/y6d3z6f8z9/1
Code Repository https://universe.roboflow.com/hakuna-matata/cdd-g8a6g
Rafiur Rahman Rafit EWU 2018-3-60-111
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
dataset repo: https://github.com/dorianprill/dataset-bicycle-geometry
The data set contains more than 6400 observations of the following 30 variables:
columns = [
'URL',
'Brand',
'Model',
'Year',
'Category',
'Motorized',
'Frame Size',
'Frame Config',
'Wheel Size',
'Reach',
'Stack',
'STR',
'Front Center',
'Head Tube Angle',
'Seat Tube Angle Effective',
'Seat Tube Angle Real',
'Top Tube Length',
'Top Tube Length Horizontal',
'Head Tube Length',
'Seat Tube Length',
'Standover Height',
'Chainstay Length',
'Wheelbase',
'Bottom Bracket Offset',
'Bottom Bracket Height',
'Fork Installation Height',
'Fork Offset',
'Fork Trail',
'Suspension Travel (rear)',
'Suspension Travel (front)',
]
Multiple variants may be recorded for each model. Variants depend mostly on Frame Size, Frame Config, Wheel Size, Suspension Travel (rear), Suspension Travel (front).
Most of the columns are self-explanatory if you are into bikes. There may be many nulls in the numeric columns since different manfuacturers may use a slightly different set of values and some values are normally only stated for a certain category of bikes.
Some of these values can be computed with simple geometry.
The URL column contains the URL of the page from which the data was extracted. The last number in the URL is the database ID of the bike.
The API currently has no category for electric bikes. It has a variable has_motor but it is always false. This is probably a bug in the API or not recorded in the database (yet).
I have included it in the data set as Motorized for future-proofing but you can safely drop it for now.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Motivation:
Phishing attacks are one of the most significant cyber threats in today’s digital era, tricking users into divulging sensitive information like passwords, credit card numbers, and personal details. This dataset aims to support research and development of machine learning models that can classify URLs as phishing or benign.
Applications:
- Building robust phishing detection systems.
- Enhancing security measures in email filtering and web browsing.
- Training cybersecurity practitioners in identifying malicious URLs.
The dataset contains diverse features extracted from URL structures, HTML content, and website metadata, enabling deep insights into phishing behavior patterns.
This dataset comprises two types of URLs:
1. Phishing URLs: Malicious URLs designed to deceive users.
2. Benign URLs: Legitimate URLs posing no harm to users.
Key Features:
- URL-based features: Domain, protocol type (HTTP/HTTPS), and IP-based links.
- Content-based features: Link density, iframe presence, external/internal links, and metadata.
- Certificate-based features: SSL/TLS details like validity period and organization.
- WHOIS data: Registration details like creation and expiration dates.
Statistics:
- Total Samples: 800 (400 phishing, 400 benign).
- Features: 22 including URL, domain, link density, and SSL attributes.
To ensure statistical reliability, a power analysis was conducted to determine the minimum sample size required for binary classification with 22 features. Using a medium effect size (0.15), alpha = 0.05, and power = 0.80, the analysis indicated a minimum sample size of ~325 per class. Our dataset exceeds this requirement with 400 examples per class, ensuring robust model training.
Insights from EDA:
- Distribution Plots: Histograms and density plots for numerical features like link density, URL length, and iframe counts.
- Bar Plots: Class distribution and protocol usage trends.
- Correlation Heatmap: Highlights relationships between numerical features to identify multicollinearity or strong patterns.
- Box Plots: For SSL certificate validity and URL lengths, comparing phishing versus benign URLs.
EDA visualizations are provided in the repository.
The repository contains the Python code used to extract features, conduct EDA, and build the dataset.
Phishing detection datasets must balance the need for security research with the risk of misuse. This dataset:
1. Protects User Privacy: No personally identifiable information is included.
2. Promotes Ethical Use: Intended solely for academic and research purposes.
3. Avoids Reinforcement of Bias: Balanced class distribution ensures fairness in training models.
Risks:
- Misuse of the dataset for creating more deceptive phishing attacks.
- Over-reliance on outdated features as phishing tactics evolve.
Researchers are encouraged to pair this dataset with continuous updates and contextual studies of real-world phishing.
This dataset is shared under the MIT License, allowing free use, modification, and distribution for academic and non-commercial purposes. License details can be found here.
Facebook
TwitterFigures containing a histogram of frequency of effect sizes on AG and BG herbivores and a funnel plot of effect size and sample sizes indicating absence of publication bias.
Facebook
TwitterExploratory Data Analysis (EDA) on the Online Shoppers Purchasing Intention Dataset
Author: Shira Bash
Project Overview
This project performs Exploratory Data Analysis (EDA) on the Online Shoppers Purchasing Intention dataset.The goal is to understand which behavioral patterns influence the likelihood that a website visitor completes a purchase(Revenue = True). The analysis includes:
Data exploration & validation
Visualizations (histograms, scatter plots, box… See the full description on the dataset page: https://huggingface.co/datasets/shiraBASH/online-shoppers-eda.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Data for 2D Lagrangian Particle tracking and evaluation for their hydrodynamic characteristics ## Abstract This dataset entails PYTHON code for fluid mechanic evaluation of Lagrangian Particles with the "Consensus-Based tracking with Selective Rejection of Tracklets" (CSRT) algorithm in the "OpenCV" library, written by Ryan Rautenbach in the framework of his Master thesis. ## Workflow for Lagrangian Particle tracking and evaluatio via OpenCV In the following a brief introduction and guide based on the folders in the repository is laid out. More code specific instructions can be found in the respective codes. working_env_RMR.yml --> Contains the entire environment including software versions (here used with Spyder IDE and Conda) with which the datasets were evaluated. 01 --> The tracking always begins with the same 01_milti[...] folder in which the python code with OpenCV algorithm is located. For tracking the tracking to work certain directories are required in which the raw images are to be stored (separate from anything else) as well as a directory in which the results are to be save (not the same directory as the raw data). After tracking is completed for all respective experiments and the results directories are adequately labelled and stored any of the other code files can be used for respective analyses. The order of folders beyond the first 01 directory has no relevance to the order of evaluation however can ease the understanding of evaluated data if followed. 02 --> Evaluation of amount of circulations and respective circulation time in experimental vat. (code can be extended to calculate the circulation time based on the various plains that are artificially set) 03 --> Code for the calculation of the amount of contacts with the vat floor. Code requires certain visual evaluations based on the LP trajectories, as the plain/barrier for the contact evaluation has to be manually set. 04 --> Contains two codes that can be applied to results data to combine individual results into larger more processable arrays within python. 05 --> Contains the code to plot the trajectory of single experiments of Lagrangian particles based on their positional results and velocity at respective position, highlighting the trajectory over the experiment. 06 --> Condes to create 1D histograms based on the probability density distribution and velocity distributions in cumulative experiments. 07 --> Codes for plotting the 2D probability density distribution (2D Histograms) of Lagrangian Particles based on the cumulative experiments. Code provides values for the 2D grid, plotting is conducted in Origin Lab or similar graphing tools, graphing can also be conducted in python whereby the seaborn (matplotlib) library is suggested. 08 --> Contain the code for the dimensionless evaluation of the results based on the respective Stokes number approaches and weighted averages. 2D histograms are also vital to this evaluation, whereby the plotting is again conducted in Origin Lab as values are only calculated in code. 09 --> Directory does not contain any python codes but instead contains the respective Origin Lab files for the graphing, plotting and evaluation of results calculated via python is given. Respective tables, histograms and heat maps are hereby given to be used as templates if necessary.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study evaluated how interspecific variation in diameter distributions relates to their growth-diameter and mortality-diameter curves and to population growth rates, using 25 y of demographic data from the 50-ha Barro Colorado Island plot. More specifically, this document presents the truncated Weibull fits to diameter distributions, the corresponding truncated Weibull parameters, the parameters of the growth-diameter and mortality-diameter curves and the population growth rates for Barro Colorado Island species included in the above mentioned study.CITATION FOR SUPPORTING DATA: Lima, R.A.F., Muller-Landau, H.C., Prado, P.I. & Condit, R. 2016. How do size distributions relate to concurrently measured demographic rates? Evidence from over 150 tree species in Panama: Supporting data. http://dx.doi.org/10.5479/10088/28131 CITATION FOR ORIGINAL ARTICLE: Lima, R.A.F., Muller-Landau, H.C., Prado, P.I. & Condit, R. 2016. How do size distributions relate to concurrently measured demographic rates? Evidence from over 150 tree species in Panama. Journal of Tropical Ecology. DOI doi: 10.1017/S0266467416000146. FILES INCLUDED WITH SUPPORTING DATA: Table S1. Parameters of the truncated Weibull fits to size distributions (beta and alpha), parameters of the growth-dbh and mortality-dbh curves and population growth rates (lambda) for the 174 Barro Colorado Island species included in this study. Figure S1. Diameter distributions of the species evaluated in this study and their truncated Weibull fits for the 2010 census of the Barro Colorado Island 50-ha plot. Since the truncated Weibull was fitted directly to data and not to the histograms show below, therefore there may be some disparities between them. Each histogram contains the density on the y-axis and the diameter in mm on the x-axis. Species full names are given in Table S1.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Repository composition:
*** Dataset ***
Anonymous data pertaining to each participant is stored in a single folder, named S##.
Each of these folders contains the following:
1) Raw IMU data from the G-Walk sensor (3-axis acceleration, 3-axis gyroscope, 3-axis magnetometer) recorded during the Timed-Up and Go tests in .txt files. Specifically, the 3 experimental conditions are "Unbraced", "Conventional" and "3DPrinted". Each condition was recorded three times (01, 02, 03).
2) A .xlsx file named "TUG_Metrics" with the values of the TUG metrics for each condition.
3) A .xlsx file named "Segmentation_Times" with the start and end timepoints of the TUG phases for each condition.
*** Boxplot ***
This is a folder containing boxplots in .png files for each TUG metric, comparing the three experimental conditions.
*** Histogram ***
This is a folder containing histograms in .png files for each TUG metric, comparing the three experimental conditions.
*** QQ Plot ***
This is a folder containing qq-plots in .png files for each TUG metric, comparing the three experimental conditions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Activity Title: "Chart the Story: Infographic Challenge" (This activity is prepared for students to practice data visualization)
Description: Each group chooses a theme (e.g., sports stats, movies, weather) and creates a multi-plot visual infographic using: • Histograms • Proper legends, color schemes • Subplots for comparison • Informative annotations and custom styles
Outcome: Poster or presentation gallery walk where peers rate clarity and visual storytelling.
Facebook
TwitterJupyter notebook files containing the Python script used for analyzing interacting effects of water chemistry features on zinc anode passivation. Includes code to evaluate Master Dataset with histograms, correlation matrices, scatter plots, Dunn's tests, and logistic regression models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data visualization is important for statistical analysis, as it helps convey information efficiently and shed lights on the hidden patterns behind data in a visual context. It is particularly helpful to display circular data in a two-dimensional space to accommodate its nonlinear support space and reveal the underlying circular structure which is otherwise not obvious in one-dimension. In this article, we first formally categorize circular plots into two types, either height- or area-proportional, and then describe a new general methodology that can be used to produce circular plots, particularly in the area-proportional manner, which in our opinion is the more appropriate choice. Formulas are given that are fairly simple yet effective to produce various circular plots, such as smooth density curves, histograms, rose diagrams, dot plots, and plots for multiclass data. Supplemental materials for this article are available online.