37 datasets found

E
Exploratory Data Analysis (EDA) Tools Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing volume and complexity of data across industries. The rising need for data-driven decision-making, coupled with the expanding adoption of cloud-based analytics solutions, is fueling market expansion. While precise figures for market size and CAGR are not provided, a reasonable estimation, based on the prevalent growth in the broader analytics market and the crucial role of EDA in the data science workflow, would place the 2025 market size at approximately $3 billion, with a projected Compound Annual Growth Rate (CAGR) of 15% through 2033. This growth is segmented across various applications, with large enterprises leading the adoption due to their higher investment capacity and complex data needs. However, SMEs are witnessing rapid growth in EDA tool adoption, driven by the increasing availability of user-friendly and cost-effective solutions. Further segmentation by tool type reveals a strong preference for graphical EDA tools, which offer intuitive visualizations facilitating better data understanding and communication of findings. Geographic regions, such as North America and Europe, currently hold a significant market share, but the Asia-Pacific region shows promising potential for future growth owing to increasing digitalization and data generation. Key restraints to market growth include the need for specialized skills to effectively utilize these tools and the potential for data bias if not handled appropriately. The competitive landscape is dynamic, with both established players like IBM and emerging companies specializing in niche areas vying for market share. Established players benefit from brand recognition and comprehensive enterprise solutions, while specialized vendors provide innovative features and agile development cycles. Open-source options like KNIME and R packages (Rattle, Pandas Profiling) offer cost-effective alternatives, particularly attracting academic institutions and smaller businesses. The ongoing development of advanced analytics functionalities, such as automated machine learning integration within EDA platforms, will be a significant driver of future market growth. Further, the integration of EDA tools within broader data science platforms is streamlining the overall analytical workflow, contributing to increased adoption and reduced complexity. The market's evolution hinges on enhanced user experience, more robust automation features, and seamless integration with other data management and analytics tools.
d
Physical Properties of Lakes: Exploratory Data Analysis
search.dataone.org
hydroshare.org
Updated Apr 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Garcia; Kateri Salk (2022). Physical Properties of Lakes: Exploratory Data Analysis [Dataset]. https://search.dataone.org/view/sha256%3A82a3bd46ad259724cad21b7a344728253ea4e6d929f6134e946c379585f903f6
Explore at:
Dataset updated
Apr 15, 2022
Dataset provided by
Hydroshare
Authors
Gabriela Garcia; Kateri Salk
Time period covered
May 27, 1984 - Aug 17, 2016
Area covered
Description
Exploratory Data Analysis for the Physical Properties of Lakes

This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on the physical properties of lakes.

Introduction

Lakes are dynamic, nonuniform bodies of water in which the physical, biological, and chemical properties interact. Lakes also contain the majority of Earth's fresh water supply. This lesson introduces exploratory data analysis using R statistical software in the context of the physical properties of lakes.

Learning Objectives

After successfully completing this exercise, you will be able to:

Apply exploratory data analytics skills to applied questions about physical properties of lakes

Communicate findings with peers through oral, visual, and written modes
f
Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...
acs.figshare.com
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford (2023). The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE) [Dataset]. http://doi.org/10.1021/acs.jcim.1c00244.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.1c00244.s002
Dataset updated
Jun 8, 2023
Dataset provided by
ACS Publications
Authors
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
m
Data and R scripts for 'Reliability of geochemical analyses: Deja vu all...
data.mendeley.com
Updated Mar 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ola Anfin Eggen (2019). Data and R scripts for 'Reliability of geochemical analyses: Deja vu all over again' [Dataset]. http://doi.org/10.17632/pvw557y82p.1
Explore at:
Unique identifier
https://doi.org/10.17632/pvw557y82p.1
Dataset updated
Mar 12, 2019
Authors
Ola Anfin Eggen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The zipped file contains the following: - data (as csv, in the 'data' folder), - R scripts (as Rmd, in the rro folder), - figures (as pdf, in the 'figs' folder), and - presentation (as html, in the root folder).
Explore data formats and ingestion methods
kaggle.com
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Why this Dataset

This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

Test Data Formats in Python

Test Data Formats in R

Iris Dataset

Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

The file downloaded is iris.data and is formatted as a comma delimited file.

This small data collection was created to help you test your skills with ingesting various data formats.

Content

This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
* feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
* npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

Acknowledgements

I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

Inspiration

Use these data formats to test your skills in ingesting data in various formats.
f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
f
ftmsRanalysis: An R package for exploratory data analysis and interactive...
plos.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue (2023). ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007654
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007654
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Computational Biology
Authors
Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.
Reddit AskScience Flair Analysis Dataset
kaggle.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumit Mishra (2025). Reddit AskScience Flair Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/sumitm004/reddit-raskscience-flair-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sumit Mishra
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Context

Reddit is a massive platform for news, content, and discussions, hosting millions of active users daily. Among its vast number of subreddits, we focus on the r/AskScience community, where users engage in science-related discussions and questions.

Content

This dataset is derived from the r/AskScience subreddit, collected between January 1, 2016, and May 20, 2022. It includes 612,668 datapoints across 22 columns, featuring diverse information such as the content of the questions, submission descriptions, associated flairs, NSFW/SFW status, year of submission, and more. The data was extracted using Python and Pushshift's API, followed by some cleaning with NumPy and pandas. Detailed column descriptions are available for clarity.

Mendeley Data

Ideas for Usage

Flair Prediction:Train models to predict post flairs (e.g., 'Science', 'Ask', 'Discussion') to automate content categorization for platforms like Reddit.

NSFW Classification: Classify posts as SFW or NSFW based on textual content, enabling content moderation tools for online forums.

Text Mining / NLP Tasks: Apply NLP techniques like Sentiment Analysis, Topic Modeling, and Text Classification to explore the content and themes of science-related discussions.

Community Engagement Analysis: Investigate which post types or flairs generate more engagement (e.g., upvotes or comments), offering insights into user interaction.

Trend Detection in Science Topics: Identify emerging science topics and analyze shifts in interest areas, which can help predict future trends in scientific discussions.
EDA Signal Dataset Collected During Startle Events While Walking With a...
zenodo.org
zip
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Villalba-Bravo; Rafael Villalba-Bravo (2025). EDA Signal Dataset Collected During Startle Events While Walking With a Smart Cane [Dataset]. http://doi.org/10.5281/zenodo.15715155
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15715155
Dataset updated
Jun 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Villalba-Bravo; Rafael Villalba-Bravo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EDA Signal Dataset Collected During Startle Events While Walking With a Smart Cane

This dataset accompanies the publication (currently under review):

Villalba-Bravo, R., Grande-Bueno, S., Trujillo-León, A., & Vidal-Verdú, F.
Analysis of EDA signal features under motion artifacts for non-personalized detection of startle events using a smart cane
IEEE SENSORS 2025, Vancouver, Canada.

Description

This dataset includes Electrodermal Activity (EDA) signals collected from seven participants during an experiment in which they walked on a treadmill at a constant speed of 1 km/h while using a smart cane. During the walking task, participants were exposed to auditory startle stimuli designed to elicit stress responses. The smart cane was equipped with a Galvanic Skin Response (GSR) sensor integrated into its handle to continuously record physiological signals in a natural walking context.

The data is organized by participant. All participants provided written informed consent both to take part in the experiment and to allow their anonymized data to be publicly shared for research purposes. Furthermore, the experiment was approved by the Ethical Committee of the Universidad de Málaga (reference 46-2024-H).

Folder Structure

Each folder corresponds to a particiapnt session (e.g., S0/, S2/, etc.) and contains the following files:

S0/
├── S0_DataExperiment.mat
├── S0_audioEventVector.mat
└── S0_SA_Score.mat

...

S8/
├── S8_DataExperiment.mat
├── S8_audioEventVector.mat
└── S8_SA_Score.mat

In addition, the dataset includes a CSV file named caneFeatures_pre_post.csv, containing the extracted features from the GSR, tonic and phasic signals, allowing for the replication of the statistical analyses presented in the study.

File Descriptions

1. S*_DataExperiment.mat

Description: This file contains the EDA signals acquired at a 4 Hz sampling rate during the experiment, stored in MATLAB .mat format as a structured variable.

Format: MATLAB Struct (3 fields)

GSR: Contains the raw GSR signal along with associated time information: TimeStampDate (UTC date-time format) and TimeStampPosix (POSIX timestamp).

TONIC: Contains the tonic component of the EDA signal with the same timestamp fields.

PHASIC: Contains the phasic component of the EDA signal with the corresponding timestamps.

2. S*_audioEventVector.mat

Description: This file contains information about the timing of the auditory startle stimuli presented during the experiment. The data is stored as a MATLAB struct sampled at 32 Hz.

Format: MATLAB Struct (3 fields)

data: A binary step signal indicating the presence of auditory events (0 = no stimulus, 1 = stimulus being played).

TimeStampDate: A vector of timestamps in MATLAB datetime format, corresponding to each sample in the data field.

3. S*_SA_Score.mat

Description: This file contains the self-reported State Anxiety (STAI-State) scores provided by each participant before and after the experimental session. The data is stored as a MATLAB struct.

Format: MATLAB Struct (2 fields)

Training: Numeric score reported after the training session.

Experiment: Numeric score reported after the experimental session.

Contact Information

For any questions or further information regarding this dataset, please contact fvidal@uma.es.
Cleaned Auto Dataset 1985
kaggle.com
Updated Oct 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faisal Moiz Hussain (2021). Cleaned Auto Dataset 1985 [Dataset]. https://www.kaggle.com/datasets/faisalmoizhussain/cleaned-auto-dataset-1985/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Faisal Moiz Hussain
Description
Context

Tailor made data to apply the machine learning models on the dataset. Where the newcomers can easily perform their EDA.

The data consists of all the features of the four wheelers available in the market in 1985. We need to predict the **price of the car ** using Linear Regression or PCA or SVM-R etc.,
f
Data from: Penguins Go Parallel: A Grammar of Graphics Framework for...
tandf.figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Susan VanderPlas; Yawei Ge; Antony Unwin; Heike Hofmann (2023). Penguins Go Parallel: A Grammar of Graphics Framework for Generalized Parallel Coordinate Plots [Dataset]. http://doi.org/10.6084/m9.figshare.22467369.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22467369.v2
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
Susan VanderPlas; Yawei Ge; Antony Unwin; Heike Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Parallel Coordinate Plots (PCP) are a valuable tool for exploratory data analysis of high-dimensional numerical data. The use of PCPs is limited when working with categorical variables or a mix of categorical and continuous variables. In this article, we propose Generalized Parallel Coordinate Plots (GPCP) to extend the ability of PCPs from just numeric variables to dealing seamlessly with a mix of categorical and numeric variables in a single plot. In this process we find that existing solutions for categorical values only, such as hammock plots or parsets become edge cases in the new framework. By focusing on individual observations rather than a marginal frequency we gain additional flexibility. The resulting approach is implemented in the R package ggpcp. Supplementary materials for this article are available online.
Multidimensional Dataset for APA Investigations in Cancer Patients
zenodo.org
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Cascella; Marco Cascella; Alfonso Maria Ponsiglione; Alfonso Maria Ponsiglione; Vittorio Santoriello; Vittorio Santoriello; Ornella Piazza; Ornella Piazza; Francesco Amato; Francesco Amato; Maria Romano; Maria Romano (2024). Multidimensional Dataset for APA Investigations in Cancer Patients [Dataset]. http://doi.org/10.5281/zenodo.13711426
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13711426
Dataset updated
Sep 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marco Cascella; Marco Cascella; Alfonso Maria Ponsiglione; Alfonso Maria Ponsiglione; Vittorio Santoriello; Vittorio Santoriello; Ornella Piazza; Ornella Piazza; Francesco Amato; Francesco Amato; Maria Romano; Maria Romano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains data collected from patients suffering from cancer-related pain. The features extracted from clinical data (including typical cancer phenomena such as breakthrough pain) and the biosignal acquisitions contributed to the definition of a multidimensional dataset. This unique database can be useful for the characterization of the patient’s pain experience from a qualitative and quantitative perspective. We implemented measurable biosignals-related indicators of the individual’s pain response and of the overall Autonomic Nervous System (ANS) functioning. The most peculiar features extracted from EDA and ECG signals can be adopted to investigate the status and complex functioning of the ANS through the study of sympatho-vagal activations. Specifically, while EDA is mainly related sympathetic activation, the Heart Rate Variability (HRV), which can be derived from ECG recordings, is strictly related to the interplay between sympathetic and parasympathetic functioning.

As far as the EDA signal, two types of analyzes have been performed: (i) the Trough-To-Peak analysis (TTP), or min-max analysis, aimed at measuring the difference between the Skin Conductance (SC) at the peak of a response and its previous minimum within pre-established time-windows; (ii) the Continuous Decomposition Analysis (CDA), aimed at performing a decomposition of SC data into continuous signals of tonic (basic level of conductance) and phasic (short-duration changes in the SC) activity. Before applying the TPP analysis or the CDA, the signal was filtered by means of a fifth-order Butterworth low-pass filter with a cutoff frequency of 1 Hz and downsampled up to 10 Hz to reducing the computational burden of the analysis. The application of TPP and CDA allowed the detection and measurement of SC Responses (SCR) and the following parameters have been calculated for both TPP and CDA methodologies:

Total number of detected SCRs.

Maximum value of SCRs [measured in μS].

Minimum value of SCRs [measured in μS].

Arithmetic mean of the SCRs [measured in μS].

Maximum interval between SCRs [measured in ms].

Minimum interval between SCRs [measured in ms].

Arithmetic mean of the intervals between SCRs [measured in ms].

Concerning the ECG, the RR series of interbeat intervals (i.e., the time between successive R waves of the QRS complex on the ECG waveform) has been computed to extract time-domain parameters of the HRV. The R peak detection was carried out by adopting the Pan–Tompkins algorithm for QRS detection and R peak identification. The corresponding RR series of interbeat intervals were derived as the difference between successive R peaks.

The ECG-derived RR time series was then filtered by means of a recursive procedure to remove the intervals differing most from the mean of the surrounding RR intervals. Then, both the Time-Domain Analysis (TDA) and Frequency-Domain Analysis (FDA) of the HRV have been carried out to extract the main features characterizing the variability of the heart rhythm. Time-domain parameters are obtained from statistical analysis of the intervals between heart beats and are used to describe how much variability in the heartbeats is present at various time scales.

The parameters computed through the TDA include the following:

Arithmetic mean of the RR time series [measured in ms].

The standard deviation of the RR time series [measured in ms].

Mean value of heart rate [measured in bpm].

Standard deviation of the heart rate [measured in bpm].

Root Mean Square of Successive Differences of RR intervals [measured in ms], which is sensitive to high-frequency heart period fluctuations in the respiratory frequency range and has been used as an index of vagal cardiac control.

Number of successive RR intervals whose difference is higher than 50 ms.

Percentage of successive RR intervals higher than 50 ms.

Number of successive RR intervals whose difference is higher than 50 ms.

Frequency-domain parameters reflect the distribution of spectral power across different frequencies bands and are used to assess specific components of HRV (e.g., thermoregulation control loop, baroreflex control loop, and respiration control loop, which are regulated by both sympathetic and vagal nerves of the ANS).
The parameters computed through the FDA have been computed by adopting the Welch's Fourier periodogram method based on the Discrete Fourier Transform (DFT), which allows the expression of the RR series in the discrete frequency domain. However, due to the non-stationarity of the RR series, Welch Fourier periodogram method is used for dealing with non-stationarity. Specifically, Welch's periodogram divides the signal into specific periods of constant length appliying the Fast Fourier Transform (FFT) trasforming individually these parts of the signal. The periodogram is basically a way of estimating power spectral density of a time series.

The FDA parameters include the following:

Peak value in the Very Low Frequency Band of the HRV power density spectrum [measured in Hz].

Peak value in the Low Frequency Band of the HRV power density spectrum [measured in Hz].

Peak value in the High Frequency Band of the HRV power density spectrum [measured in Hz].

Power in the Very Low Frequency Band of the HRV power density spectrum [measured in ms^2].

Power in the Low Frequency Band of the HRV power density spectrum [measured in ms^2].

Power in the High Frequency Band of the HRTotal Power of the HRV power density spectrum [measured in ms^2].

Total Power of the HRV power density spectrum [measured in ms^2].

Percentage power in the Very Low Frequency Band of the HRV power density spectrum with respect to the total power.

Percentage power in the Low Frequency Band of the HRV power density spectrum with respect to the total power.

Percentage power in the High Frequency Band of the HRV power density spectrum with respect to the total power.

Normalized power in the Low Frequency Band of the HRV power density spectrum with respect to the sum of LF and HF power.

Normalized power in the High Frequency Band of the HRV power density spectrum with respect to the sum of LF and HF power.

Sympathovagal balance measured as the ration between power in LF and power in the LF band.
f
Data from: Metal-Induced B–H Activation in Three-Component Reactions:...
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guifeng Liu; Hong Yan (2023). Metal-Induced B–H Activation in Three-Component Reactions: 16-Electron Complex CpCo(S2C2B10H10), Ethyl Diazoacetate, and Alkynes [Dataset]. http://doi.org/10.1021/om501016w.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/om501016w.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Guifeng Liu; Hong Yan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The three-component reactions of the 16-electron half-sandwich complex CpCo(S2C2B10H10) (Cp = cyclopentadienyl) (1) with ethyl diazoacetate (EDA) and alkynes R1R2 (R1 = Ph, R2 = H; R1 = CO2Me, R2 = H; R1 = R2 = CO2Me; R1 = Fc, R2 = H) at ambient temperature lead to compounds CpCo(S2C2B10H9)(CH2CO2Et) (CHCO2Et)(R1R2) (2–5), CpCo(S2C2B10H9)(CH2CO2Et)(R2–R1–CHCO2Et) (6–9), CpCo(S2C2B10H9)(CH2CO2Et)(CH(Ph)CCHCO2Et) (10), and CpCo(S2C2B10H9)(CH2CO2Et)(CH(Fc)–CH–CCO2Et) (11). In 2–5, one alkyne is stereoselectively inserted into the Co–B bond, one EDA molecule is used to form a sulfide ylide, and the second EDA molecule is inserted into one Co–S bond to form a three-membered metallacyclic ring. At ambient temperature 2–5 undergo rearrangement to 6–9 through migratory insertion of the inserted EDA. Different from 2–5, in 10 phenylacetylene is inserted into the Co–B bond at the terminal carbon and the terminal carbon is coupled with one EDA to afford a six-membered metallacyclic ring with the CO coordination to metal. In 11, a stable Co–B bond is generated, and one EDA and one ethynylferrocene are inserted into the Co–S bond. Moreover, if weakly basic silica is present, 2–4 can lose an apex BH close to the two carbon atoms of o-carborane to give rise to CpCo(S2C2B9H9)(CH2CO2Et)2(R1R2) (12–14) accompanied by the coordination of the two sulfide ylide units to the metal center. The solid-state structures of 2–4, 6–12, and 14 were characterized by X-ray structural analysis.
VAERS Data as of 19th March 2021
kaggle.com
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gayathri Nagarajan (2021). VAERS Data as of 19th March 2021 [Dataset]. https://www.kaggle.com/gayathrirprog/vaers-data-as-of-19th-march-2021/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gayathri Nagarajan
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

Every story has a question that triggered it. Mine was - What are the vaccinations being administered in USA? What are people's reported incidents post the vaccine doses ?

I look at awe at which folks do EDA in kaggle and I have a long way to go.But I want to start small and I have already started my journey.The folks who do wonderful EDA are my source of inspiration and I learn by doing their notebooks in R. Iam a R fan for now.

Content

The data here was downloaded on 29th March from CDC Wonder site which helps take reports on VAERS.

Acknowledgements

My google search on VAERS and Kaggle search for VAERS got me a wonderful notebook and dataset. Thanks to folks like Ayush Garg and jmreuter for helping folks like me learn more.

Inspiration

What are the vaccinations being administered in USA? What are people's reported incidents post the vaccine doses ? Which vaccine has most side effects in all age groups ? Which vaccine has most side effects in each state?
R
EDA binds EDAR
reactome.org
biopax2, biopax3 +5
Updated Sep 27, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2005). EDA binds EDAR [Dataset]. http://reactome.org/content/detail/R-RNO-5669012
Explore at:
pdf, biopax3, owl, sbgn, sbml, biopax2, docxAvailable download formats
Dataset updated
Sep 27, 2005
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This event has been computationally inferred from an event that has been demonstrated in another species.
The inference is based on the homology mapping from PANTHER. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.
More details and caveats of the event inference in Reactome. For details on PANTHER see also: http://www.pantherdb.org/about.jsp
Smartwatch Health Data (Uncleaned)
kaggle.com
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammed Arfath R (2025). Smartwatch Health Data (Uncleaned) [Dataset]. https://www.kaggle.com/datasets/mohammedarfathr/smartwatch-health-data-uncleaned/suggestions?status=pending
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammed Arfath R
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset simulates health-related outputs from a smartwatch, mimicking real-world issues in data collection, making it perfect for applying data preprocessing techniques such as handling missing values, outliers, duplicates, and inconsistencies.

Dataset Overview: Total Rows: 10,000 Total Columns: 7 Use Case: Health monitoring using smartwatch sensor data
f
Data_Sheet_1_Mind the Queue: A Case Study in Visualizing Heterogeneous...
frontiersin.figshare.com
figshare.com
zip
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Catherine McVey; Fushing Hsieh; Diego Manriquez; Pablo Pinedo; Kristina Horback (2023). Data_Sheet_1_Mind the Queue: A Case Study in Visualizing Heterogeneous Behavioral Patterns in Livestock Sensor Data Using Unsupervised Machine Learning Techniques.ZIP [Dataset]. http://doi.org/10.3389/fvets.2020.00523.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fvets.2020.00523.s001
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Catherine McVey; Fushing Hsieh; Diego Manriquez; Pablo Pinedo; Kristina Horback
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sensor technologies allow ethologists to continuously monitor the behaviors of large numbers of animals over extended periods of time. This creates new opportunities to study livestock behavior in commercial settings, but also new methodological challenges. Densely sampled behavioral data from large heterogeneous groups can contain a range of complex patterns and stochastic structures that may be difficult to visualize using conventional exploratory data analysis techniques. The goal of this research was to assess the efficacy of unsupervised machine learning tools in recovering complex behavioral patterns from such datasets to better inform subsequent statistical modeling. This methodological case study was carried out using records on milking order, or the sequence in which cows arrange themselves as they enter the milking parlor. Data was collected over a 6-month period from a closed group of 200 mixed-parity Holstein cattle on an organic dairy. Cows at the front and rear of the queue proved more consistent in their entry position than animals at the center of the queue, a systematic pattern of heterogeneity more clearly visualized using entropy estimates, a scale and distribution-free alternative to variance robust to outliers. Dimension reduction techniques were then used to visualize relationships between cows. No evidence of social cohesion was recovered, but Diffusion Map embeddings proved more adept than PCA at revealing the underlying linear geometry of this data. Median parlor entry positions from the pre- and post-pasture subperiods were highly correlated (R = 0.91), suggesting a surprising degree of temporal stationarity. Data Mechanics visualizations, however, revealed heterogeneous non-stationary among subgroups of animals in the center of the group and herd-level temporal outliers. A repeated measures model recovered inconsistent evidence of a relationships between entry position and cow attributes. Mutual conditional entropy tests, a permutation-based approach to assessing bivariate correlations robust to non-independence, confirmed a significant but non-linear association with peak milk yield, but revealed the age effect to be potentially confounded by health status. Finally, queueing records were related back to behaviors recorded via ear tag accelerometers using linear models and mutual conditional entropy tests. Both approaches recovered consistent evidence of differences in home pen behaviors across subsections of the queue.

Plotly Dashboard Healthcare

kaggle.com

Updated Jan 4, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

A SURESH (2022). Plotly Dashboard Healthcare [Dataset]. https://www.kaggle.com/datasets/sureshmecad/plotly-dashboard-healthcare/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 4, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

A SURESH

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Data Visualization

Content

a. Scatter plot

  i. The webapp should allow the user to select genes from datasets and plot 2D scatter plots between 2 variables(expression/copy_number/chronos) for 
    any pair of genes.

  ii. The user should be able to filter and color data points using metadata information available in the file “metadata.csv”.

  iii. The visualization could be interactive - It would be great if the user can hover over the data-points on the plot and get the relevant information (hint - 
    visit https://plotly.com/r/, https://plotly.com/python)

  iv. Here is a quick reference for you. The scatter plot is between chronos score for TTBK2 gene and expression for MORC2 gene with coloring defined by
    Gender/Sex column from the metadata file.

b. Boxplot/violin plot

  i. User should be able to select a gene and a variable (expression / chronos / copy_number) and generate a boxplot to display its distribution across 
   multiple categories as defined by user selected variable (a column from the metadata file)

 ii. Here is an example for your reference where violin plot for CHRONOS score for gene CCL22 is plotted and grouped by ‘Lineage’

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Defence Data 2012
data.europa.eu
pdf
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Defence Agency, Defence Data 2012 [Dataset]. https://data.europa.eu/set/data/defence-data-2012
Explore at:
pdfAvailable download formats
Dataset authored and provided by
European Defence Agencyhttps://eda.europa.eu/
License
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
Description
Defence Data is collected by the European Defence Agency (EDA) on an annual basis. The Ministries of Defence of the Agency’s 27 participating Member States (all EU Member States except Denmark) provide the data. EDA acts as the custodian of the data and publishes the aggregated figures in this booklet. 2012 data does not include Croatia which became the 27th EDA Member State on 1 July 2013.

The data are broken down, based on a list of indicators approved by the Agency’s Ministerial Steering Board. This list has four sections, represented in the headings of the booklet:

General: macro-economic data to show how defence budgets relate to GDP and overall government spending.

Reform: major categories of defence budget spending – personnel; investment, including R and T operation and maintenance and others – to show what defence budgets are spent on.

European collaboration: for defence equipment procurement and R and T to show to what extent the Agency’s pMS are investing together.

Deployability: military deployed in crisis management operations to show the ratio between deployments and total military personnel.

The definitions used for the gathering of the data and some general caveats are listed at the end of the brochure.
f
The results of logistic regression models for each CBM location to quantify...
plos.figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Cramer; Maisa Ziadni; Kristen Hymel Scherrer; Sean Mackey; Ming-Chih Kao (2023). The results of logistic regression models for each CBM location to quantify the relationship between average pain intensity score and endorsement of each location. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010496.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1010496.t002
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS Computational Biology
Authors
Eric Cramer; Maisa Ziadni; Kristen Hymel Scherrer; Sean Mackey; Ming-Chih Kao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Location codes that start with a “1” indicate the front of the body and codes that begin with a “2” indicate the back of the body.

Facebook

Twitter

Click to copy link

Link copied

Cite

Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369

Exploratory Data Analysis (EDA) Tools Report

Explore at:

doc, ppt, pdfAvailable download formats

Dataset updated

Apr 2, 2025

Dataset authored and provided by

Market Report Analytics

License

https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing volume and complexity of data across industries. The rising need for data-driven decision-making, coupled with the expanding adoption of cloud-based analytics solutions, is fueling market expansion. While precise figures for market size and CAGR are not provided, a reasonable estimation, based on the prevalent growth in the broader analytics market and the crucial role of EDA in the data science workflow, would place the 2025 market size at approximately $3 billion, with a projected Compound Annual Growth Rate (CAGR) of 15% through 2033. This growth is segmented across various applications, with large enterprises leading the adoption due to their higher investment capacity and complex data needs. However, SMEs are witnessing rapid growth in EDA tool adoption, driven by the increasing availability of user-friendly and cost-effective solutions. Further segmentation by tool type reveals a strong preference for graphical EDA tools, which offer intuitive visualizations facilitating better data understanding and communication of findings. Geographic regions, such as North America and Europe, currently hold a significant market share, but the Asia-Pacific region shows promising potential for future growth owing to increasing digitalization and data generation. Key restraints to market growth include the need for specialized skills to effectively utilize these tools and the potential for data bias if not handled appropriately. The competitive landscape is dynamic, with both established players like IBM and emerging companies specializing in niche areas vying for market share. Established players benefit from brand recognition and comprehensive enterprise solutions, while specialized vendors provide innovative features and agile development cycles. Open-source options like KNIME and R packages (Rattle, Pandas Profiling) offer cost-effective alternatives, particularly attracting academic institutions and smaller businesses. The ongoing development of advanced analytics functionalities, such as automated machine learning integration within EDA platforms, will be a significant driver of future market growth. Further, the integration of EDA tools within broader data science platforms is streamlining the overall analytical workflow, contributing to increased adoption and reduced complexity. The market's evolution hinges on enhanced user experience, more robust automation features, and seamless integration with other data management and analytics tools.

Clear search

Close search

Google apps

Main menu

Exploratory Data Analysis (EDA) Tools Report

Physical Properties of Lakes: Exploratory Data Analysis

Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...

Data and R scripts for 'Reliability of geochemical analyses: Deja vu all...

Explore data formats and ingestion methods

Why this Dataset

Iris Dataset

Content

Acknowledgements

Inspiration

Orange dataset table

ftmsRanalysis: An R package for exploratory data analysis and interactive...

Reddit AskScience Flair Analysis Dataset

Context

Content

Ideas for Usage

EDA Signal Dataset Collected During Startle Events While Walking With a...

EDA Signal Dataset Collected During Startle Events While Walking With a Smart Cane

Description

Folder Structure

File Descriptions

1. S*_DataExperiment.mat

2. S*_audioEventVector.mat

3. S*_SA_Score.mat

Contact Information

Cleaned Auto Dataset 1985

Context

Data from: Penguins Go Parallel: A Grammar of Graphics Framework for...

Multidimensional Dataset for APA Investigations in Cancer Patients

Data from: Metal-Induced B–H Activation in Three-Component Reactions:...

VAERS Data as of 19th March 2021

Context

Content

Acknowledgements

Inspiration

EDA binds EDAR

Smartwatch Health Data (Uncleaned)

Data_Sheet_1_Mind the Queue: A Case Study in Visualizing Heterogeneous...

Plotly Dashboard Healthcare

Context

Content

Acknowledgements

Inspiration

Defence Data 2012

The results of logistic regression models for each CBM location to quantify...

Exploratory Data Analysis (EDA) Tools Report

1. `S*_DataExperiment.mat`

2. `S*_audioEventVector.mat`

3. `S*_SA_Score.mat`