100+ datasets found

E
Exploratory Data Analysis (EDA) Tools Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing volume and complexity of data across industries. The rising need for data-driven decision-making, coupled with the expanding adoption of cloud-based analytics solutions, is fueling market expansion. While precise figures for market size and CAGR are not provided, a reasonable estimation, based on the prevalent growth in the broader analytics market and the crucial role of EDA in the data science workflow, would place the 2025 market size at approximately $3 billion, with a projected Compound Annual Growth Rate (CAGR) of 15% through 2033. This growth is segmented across various applications, with large enterprises leading the adoption due to their higher investment capacity and complex data needs. However, SMEs are witnessing rapid growth in EDA tool adoption, driven by the increasing availability of user-friendly and cost-effective solutions. Further segmentation by tool type reveals a strong preference for graphical EDA tools, which offer intuitive visualizations facilitating better data understanding and communication of findings. Geographic regions, such as North America and Europe, currently hold a significant market share, but the Asia-Pacific region shows promising potential for future growth owing to increasing digitalization and data generation. Key restraints to market growth include the need for specialized skills to effectively utilize these tools and the potential for data bias if not handled appropriately. The competitive landscape is dynamic, with both established players like IBM and emerging companies specializing in niche areas vying for market share. Established players benefit from brand recognition and comprehensive enterprise solutions, while specialized vendors provide innovative features and agile development cycles. Open-source options like KNIME and R packages (Rattle, Pandas Profiling) offer cost-effective alternatives, particularly attracting academic institutions and smaller businesses. The ongoing development of advanced analytics functionalities, such as automated machine learning integration within EDA platforms, will be a significant driver of future market growth. Further, the integration of EDA tools within broader data science platforms is streamlining the overall analytical workflow, contributing to increased adoption and reduced complexity. The market's evolution hinges on enhanced user experience, more robust automation features, and seamless integration with other data management and analytics tools.
f
DataSheet1_Exploratory data analysis (EDA) machine learning approaches for...
frontiersin.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victoria Da Poian; Bethany Theiling; Lily Clough; Brett McKinney; Jonathan Major; Jingyi Chen; Sarah Hörst (2023). DataSheet1_Exploratory data analysis (EDA) machine learning approaches for ocean world analog mass spectrometry.docx [Dataset]. http://doi.org/10.3389/fspas.2023.1134141.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fspas.2023.1134141.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Victoria Da Poian; Bethany Theiling; Lily Clough; Brett McKinney; Jonathan Major; Jingyi Chen; Sarah Hörst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Many upcoming and proposed missions to ocean worlds such as Europa, Enceladus, and Titan aim to evaluate their habitability and the existence of potential life on these moons. These missions will suffer from communication challenges and technology limitations. We review and investigate the applicability of data science and unsupervised machine learning (ML) techniques on isotope ratio mass spectrometry data (IRMS) from volatile laboratory analogs of Europa and Enceladus seawaters as a case study for development of new strategies for icy ocean world missions. Our driving science goal is to determine whether the mass spectra of volatile gases could contain information about the composition of the seawater and potential biosignatures. We implement data science and ML techniques to investigate what inherent information the spectra contain and determine whether a data science pipeline could be designed to quickly analyze data from future ocean worlds missions. In this study, we focus on the exploratory data analysis (EDA) step in the analytics pipeline. This is a crucial unsupervised learning step that allows us to understand the data in depth before subsequent steps such as predictive/supervised learning. EDA identifies and characterizes recurring patterns, significant correlation structure, and helps determine which variables are redundant and which contribute to significant variation in the lower dimensional space. In addition, EDA helps to identify irregularities such as outliers that might be due to poor data quality. We compared dimensionality reduction methods Uniform Manifold Approximation and Projection (UMAP) and Principal Component Analysis (PCA) for transforming our data from a high-dimensional space to a lower dimension, and we compared clustering algorithms for identifying data-driven groups (“clusters”) in the ocean worlds analog IRMS data and mapping these clusters to experimental conditions such as seawater composition and CO2 concentration. Such data analysis and characterization efforts are the first steps toward the longer-term science autonomy goal where similar automated ML tools could be used onboard a spacecraft to prioritize data transmissions for bandwidth-limited outer Solar System missions.
E
Exploratory Data Analysis (EDA) Tools Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54164
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing volume and complexity of data across various industries. The market, estimated at $1.5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $5 billion by 2033. This expansion is fueled by several key factors. Firstly, the rising adoption of big data analytics and business intelligence initiatives across large enterprises and SMEs is creating a significant demand for efficient EDA tools. Secondly, the growing need for faster, more insightful data analysis to support better decision-making is driving the preference for user-friendly graphical EDA tools over traditional non-graphical methods. Furthermore, advancements in artificial intelligence and machine learning are seamlessly integrating into EDA tools, enhancing their capabilities and broadening their appeal. The market segmentation reveals a significant portion held by large enterprises, reflecting their greater resources and data handling needs. However, the SME segment is rapidly gaining traction, driven by the increasing affordability and accessibility of cloud-based EDA solutions. Geographically, North America currently dominates the market, but regions like Asia-Pacific are exhibiting high growth potential due to increasing digitalization and technological advancements. Despite this positive outlook, certain restraints remain. The high initial investment cost associated with implementing advanced EDA solutions can be a barrier for some SMEs. Additionally, the need for skilled professionals to effectively utilize these tools can create a challenge for organizations. However, the ongoing development of user-friendly interfaces and the availability of training resources are actively mitigating these limitations. The competitive landscape is characterized by a mix of established players like IBM and emerging innovative companies offering specialized solutions. Continuous innovation in areas like automated data preparation and advanced visualization techniques will further shape the future of the EDA tools market, ensuring its sustained growth trajectory.
S
Global Exploratory Data Analysis (EDA) Tools Market Revenue Forecasts...
statsndata.org
excel, pdf
Updated Jun 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Exploratory Data Analysis (EDA) Tools Market Revenue Forecasts 2025-2032 [Dataset]. https://www.statsndata.org/report/exploratory-data-analysis-eda-tools-market-313301
Explore at:
excel, pdfAvailable download formats
Dataset updated
Jun 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
Exploratory Data Analysis (EDA) Tools play a pivotal role in the modern data-driven landscape, transforming raw data into actionable insights. As businesses increasingly recognize the value of data in informing decisions, the market for EDA tools has witnessed substantial growth, driven by the rapid expansion of dat
house prices data exploration
kaggle.com
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
yvonne gatwiri
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Contents
f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Data: Anscombe's quintet
kaggle.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl McBride Ellis (2025). Data: Anscombe's quintet [Dataset]. https://www.kaggle.com/carlmcbrideellis/data-anscombes-quartet/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carl McBride Ellis
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This file is the data set form the famous publication Francis J. Anscombe "*Graphs in Statistical Analysis*", The American Statistician 27 pp. 17-21 (1973) (doi: 10.1080/00031305.1973.10478966). It consists of four data sets of 11 points each. Note the peculiarity that the same 'x' values are used for the first three data sets, and I have followed this exactly as in the original publication (originally done to save space), i.e. the first column (x123) serves as the 'x' for the next three 'y' columns; y1, y2 and y3.

In the dataset Anscombe_quintet_data.csv there is a new column (y5) as an example of Simpson's paradox (C. McBride Ellis "*Anscombe dataset No. 5: Simpson's paradox*", Zenodo doi: 10.5281/zenodo.15209087 (2025)
Credit EDA Case Study Data
kaggle.com
Updated Jan 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ADITYA MISHRA (2022). Credit EDA Case Study Data [Dataset]. https://www.kaggle.com/datasets/adityamishra0708/credit-eda-case-study-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ADITYA MISHRA
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by ADITYA MISHRA

Released under CC0: Public Domain

Contents
E
EDA Tools Market Report
datainsightsmarket.com
doc, pdf, ppt
Updated Dec 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2024). EDA Tools Market Report [Dataset]. https://www.datainsightsmarket.com/reports/eda-tools-market-11076
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Dec 14, 2024
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the EDA Tools Market was valued at USD XXX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 8.46% during the forecast period.EDA tools include a suite of software applications for electronic system design and analysis. They are usually applied in the design of integrated circuits and printed circuit boards. These tools speed up several steps in the design process from conceptual to final physical implementation.EDAs play a crucial role in the semiconductor industry. According to the engineers, they come in handy in designing such very complex chips with billion transistors. They help in circuit design, simulation, verification, and layout. For instance, simulation tools allow engineers to predict the behavior of a circuit before it is produced, thus saving time and resources. Verification tools allow the correctness of the design, and physical design tools optimize the lay out of the circuit on the chip. The increasing complexity of electronic systems, along with the demand for more efficient and faster designs, and the advent of emerging technologies such as 5G and AI, drives the EDA market. As semiconductor technology advances further, so will EDA tools stay at the vanguard of innovations and pick up the pace of the development of cutting-edge electronic products. Recent developments include: July 2022 - Future Facilities' acquisition by Cadence Design Systems, Inc. has been finalized, the company announced. The inclusion of Future Facilities technologies and experience bolsters Cadence's approach to intelligent system design and expands its capabilities in computational fluid dynamics (CFD) and multiphysics system analysis. Leading technology companies can make wise business decisions about data center design, operations, and lifecycle management and lessen their carbon footprint thanks to Future Facilities' electronics cooling analysis and energy performance optimization solutions for data center design and operation using physics-based 3D digital twins., April 2022 - The Silicon Integration Initiative (Si2) Technology Interoperability Trajectory Advisory Council (TITAN), a thought leadership forum dedicated to accelerating ecosystem collaboration with technology interoperability for silicon-to-system success, has welcomed Keysight Technologies, Inc. as a new member. Keysight's vertical market expertise in providing software-centric solutions that target radio frequency and microwave applications offers an essential perspective to TITAN as Si2 expands into systems., May 2021 - Siemens Digital Industries Software acquired Fractal Technologies, a provider of production signoff-quality IP validation solutions based in the U.S. and the Netherlands. With this acquisition, Siemens' electronic design automation (EDA) customers can more quickly and easily validate internal and external IP, and libraries used in their integrated circuit (IC) designs to improve the overall quality and speed time-to-market. Siemens plans to add Fractal's technology to the Xcelerator portfolio as part of its suite of EDA IC verification offerings., May 2021- Keysight Technologies Inc. acquired Quantum Benchmark, a leader in error diagnostics, error suppression, and performance validation software for quantum computing. Quantum Benchmark provides software solutions for improving and validating quantum computing hardware capabilities by identifying and overcoming the unique error challenges required for high-impact quantum computing.. Key drivers for this market are: Booming Automotive, IoT, and AI Sectors, Upcoming Trend of EDA Toolsets Equipped with Machine Learning Capabilities. Potential restraints include: Moore's Law about to be Proven Faulty. Notable trends are: IC Physical Design and Verification Segment to Grow Significantly.
f
Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...
acs.figshare.com
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford (2023). The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE) [Dataset]. http://doi.org/10.1021/acs.jcim.1c00244.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.1c00244.s002
Dataset updated
Jun 8, 2023
Dataset provided by
ACS Publications
Authors
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
f
Data from: FactExplorer: Fact Embedding-Based Exploratory Data Analysis for...
tandf.figshare.com
pdf
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Jiang; Guodao Sun; Yue Dong; Lvhan Pan; Baofeng Chang; Li Jiang; Haoran Liang; Ronghua Liang (2025). FactExplorer: Fact Embedding-Based Exploratory Data Analysis for Tabular Data [Dataset]. http://doi.org/10.6084/m9.figshare.28399639.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28399639.v1
Dataset updated
Jun 23, 2025
Dataset provided by
Taylor & Francis
Authors
Qi Jiang; Guodao Sun; Yue Dong; Lvhan Pan; Baofeng Chang; Li Jiang; Haoran Liang; Ronghua Liang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite exploratory data analysis (EDA) is a powerful approach for uncovering insights from unfamiliar datasets, existing EDA tools face challenges in assisting users to assess the progress of exploration and synthesize coherent insights from isolated findings. To address these challenges, we present FactExplorer, a novel fact-based EDA system that shifts the analysis focus from raw data to data facts. FactExplorer employs a hybrid logical-visual representation, providing users with a comprehensive overview of all potential facts at the outset of their exploration. Moreover, FactExplorer introduces fact-mining techniques, including topic-based drill-down and transition path search capabilities. These features facilitate in-depth analysis of facts and enhance the understanding of interconnections between specific facts. Finally, we present a usage scenario and conduct a user study to assess the effectiveness of FactExplorer. The results indicate that FactExplorer facilitates the understanding of isolated findings and enables users to steer a thorough and effective EDA.
Data from: Drastic changes before the 2011 Tohoku earthquake, revealed by...
figshare.com
zip
Updated Feb 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomokazu Konishi (2023). Drastic changes before the 2011 Tohoku earthquake, revealed by exploratory data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.22010279.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22010279.v1
Dataset updated
Feb 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tomokazu Konishi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tohoku Region
Description
Predicting earthquakes is of the utmost importance, especially to those countries of high risk, and although much effort has been made, it has yet to be realised. Nevertheless, there is a paucity of statistical approaches in seismic studies to the extent that an old theory is believed without verification. Seismic records of time and magnitude in Japan were analysed by exploratory data analysis (EDA). EDA is a parametric statistical approach based on the characteristics of data and is suitable for data-driven investigations. The distribution style of each dataset was determined, and the important parameters were found. This enabled us to identify and evaluate the anomalies in the data. Before the huge 2011 Tohoku earthquake, swarm earthquakes occurred before the main earthquake at improbable frequencies. The frequency and magnitude of all earthquakes increased. Both changes made larger earthquakes more likely to occur: even an M9 earthquake was expected every two years. From these simple measurements, the EDA succeeded in extracting useful information. Detecting and evaluating anomalies using this approach for every set of data would lead to a more accurate prediction of earthquakes.
Chicago Air Quality Analysis
kaggle.com
Updated May 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asjad K (2022). Chicago Air Quality Analysis [Dataset]. https://www.kaggle.com/datasets/asjad99/chicago-air-pollution
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 21, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Asjad K
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Background:

Looking at Chicago's gleaming skyline today, it's surprising to remember that not so long ago many of those buildings were black with soot from coal-fired furnaces and factories all over the city. Take a look back at old photos or films, though, and that skyline isn't so pristine.

During the Industrial Age belching smokestacks were looked at as a good thing – this meant the city that works was working! Eventually, though, we learned you can have too much of a good thing. Some days, pollution turned day into night, ruining clothing, blackening buildings, sickening Chicagoans and even stopping airplanes from taking off. Today, we can see a similar situation in countries like India, Iran, Pakistan and China where coal is still widely used.

The Chicago Tribune led the crusade against Chicago’s dirty air. The newspaper began reporting on the condition of the city's air as early as the 1870s. In one report, the author Rudyard Kipling is quoted as saying simply, "the air is dirt" after a visit to Chicago.

In 1959, Chicago established the Department of Air Pollution Control to investigate and regulate emission sources. Subsequent regulations, including the federal Clean Air Act of 1970, and more recent city and state legislation have helped further mitigate city-wide emissions. Today, Chicago air pollution levels are a small fraction of their historical levels.

Standands:

The US Environmental Protection Agency (EPA) defines “moderate” air quality as air potentially unhealthy to sensitive groups including children, the elderly, and people with pre-existing cardiovascular or respiratory health conditions.

AQI ratings are calculated by weighting 6 key criteria pollutants for their risk to health. The pollutant with the highest individual AQI becomes the ‘main pollutant’ and dictates the overall air quality index. Fine particulate matter (PM2.5) and ozone represent two of the most common ‘main pollutants’ responsible for a city’s AQI due to the weight the formula ascribes to them for their potential harm and prevalence at high levels.

PM2.5 pollution is fine particle pollution with a range of chemical compositions that measures 2.5 microns in diameter or less. The US EPA recommends that annual PM2.5 exposure not exceed 12 μg/m3. The World Health Organization (WHO), meanwhile, employs a more stringent standard, recommending that exposure remain below 10 μg/m3 annually.

learn more: https://www.iqair.com/usa/illinois/chicago

In this dataset we explore the pollution levels and learn EDA techniques in the process.
sohamphanseiitb/BIG_Data_5MSEC: BIG Data Analysis of NASA's 5 Millennium...
zenodo.org
bin, pdf
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soham Phanse; Soham Phanse (2024). sohamphanseiitb/BIG_Data_5MSEC: BIG Data Analysis of NASA's 5 Millennium Solar Eclipse Database [Dataset]. http://doi.org/10.5281/zenodo.7409106
Explore at:
bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7409106
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Soham Phanse; Soham Phanse
Description
Solar eclipses are a topic of interest among astronomers, astrologers and the general public as well. There were and will be about 11898 eclipses in the 5 millennia from 2000 BC to 3000 AD. Data visualization and regression techniques offer a deep insight into how various parameters of a solar eclipse are related to each other. Physical models can be verified and can be updated based on the insights gained from the analysis.

The study covers the major aspects of data analysis including data cleaning, pre-processing, EDA, distribution fitting, regression and machine learning based data analytics. We provide a cleaned and usable database ready for EDA and statistical analysis.
o
PIA Customer Feedback Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). PIA Customer Feedback Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/1a069a47-d689-40dd-af73-4410a79ebbb4
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset provides customer reviews for PIA Experience, gathered through web scraping from airlinequality.com. It is specifically designed for data science and analytics applications, offering valuable insights into customer sentiment and feedback. The data is suitable for various analytical tasks, including modelling, predictive analysis, feature engineering, and exploratory data analysis (EDA). Users should note that the data requires an initial cleaning phase due to the presence of null values.

Columns

reviews: Contains individual customer feedback entries pertaining to their experience with PIA. This column features approximately 160 distinct review entries.

Distribution

The dataset is provided as a CSV file. While the 'reviews' column contains 160 unique values, the exact total number of rows or records in the dataset is not explicitly detailed. It is structured in a tabular format, making it straightforward for data processing.

Usage

This dataset is ideally suited for a variety of applications, including: * Modelling * Predictive analysis * Feature engineering * Exploratory Data Analysis (EDA) * Natural Language Processing (NLP) tasks, such as sentiment analysis or topic modelling.

Coverage

The dataset's focus is primarily on customer reviews from the Asia region. It was listed on 17 June 2025, and the content relates specifically to the experiences of customers using PIA.

License

CC0

Who Can Use It

This dataset is beneficial for a range of users, including: * Data scientists looking to develop predictive models or perform advanced feature engineering. * Data analysts interested in conducting exploratory data analysis to uncover trends and patterns. * Researchers studying customer satisfaction, service quality, or airline industry performance. * Developers working on natural language processing solutions, particularly those focused on text analytics from customer feedback.

Dataset Name Suggestions

PIA Customer Feedback

PIA Experience Reviews

Airline Customer Sentiment - PIA

PIA Passenger Reviews

PIA Service Review Data

Attributes

Original Data Source: PIA Customer Reviews
Exploratory Data Analysis | EDA - use case
kaggle.com
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohinur Abdurahimova (2022). Exploratory Data Analysis | EDA - use case [Dataset]. https://www.kaggle.com/mohinurabdurahimova/exploratory-data-analysis-eda-use-case/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohinur Abdurahimova
Description
Dataset

This dataset was created by Mohinur Abdurahimova

Released under Data files © Original Authors

Contents
o
Data Science Career Opportunities (USA)
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Data Science Career Opportunities (USA) [Dataset]. https://www.opendatabay.com/data/ai-ml/6d1c5965-8fb2-4749-a8bd-f1c40861b401
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics, United States
Description
This dataset provides valuable insights into the US data science job market, containing detailed job listings scraped from the Indeed web portal on 20th November 2022. It is ideal for those seeking to understand job trends, analyse salary expectations, or develop skills in data analysis, machine learning, and natural language processing. The dataset's purpose is to offer a snapshot of available positions across various data science roles, including data scientists, machine learning engineers, and business analysts. It serves as a rich resource for exploratory data analysis, feature engineering, and predictive modelling tasks.

Columns

Title: The job title of the listed position.

Company: The hiring company posting the job.

Location: The geographic location of the job within the US.

Rating: The rating associated with the job or company.

Date: Indicates how long the job had been posted prior to 20th November 2022.

Salary: The salary information provided in US Dollars ($). Please note that many entries in this column may be missing as salary details are often not disclosed in job listings.

Description: A brief summary description of the job.

Links: The direct link to the original job posting on the Indeed platform.

Descriptions: The full-length description of the job, encompassing all details found in the complete job posting.

Distribution

This dataset is provided as a single data file, typically in CSV format. It comprises 1200 rows (records) and 9 distinct columns. The file name is data_science_jobs_indeed_us.csv.

Usage

This dataset is perfectly suited for a variety of analytical tasks and applications: * Data Cleaning and Preparation: Practise handling missing values, especially in the 'Salary' column. * Exploratory Data Analysis (EDA): Discover trends in job titles, company types, and locations. * Feature Engineering: Extract new features from the 'Descriptions' column, such as required skills, education levels, or experience. * Classification and Clustering: Develop models for salary prediction, or perform skill clustering analysis to guide curriculum development. * Text Processing and Natural Language Processing (NLP): Analyse job descriptions to identify common skill demands or industry buzzwords.

Coverage

The dataset's geographic scope is limited to job postings within the United States. All data was collected on 20th November 2022, with the 'Date' column providing information on how long each job had been active before this date. The dataset covers a wide range of data science positions, including roles such as data scientist, machine learning engineer, data engineer, business analyst, and data science manager. It is important to note the presence of many missing entries in the 'Salary' column, reflecting common data availability challenges in job listings.

License

CCO

Who Can Use It

This dataset is an excellent resource for: * Aspiring Data Scientists and Machine Learning Engineers: To sharpen their data cleaning, EDA, and model deployment skills. * Educators and Curriculum Developers: To inform and guide the development of relevant data science and analytics courses based on real-world job market demands. * Job Seekers: To understand the current landscape of data science roles, required skills, and potential salary ranges. * Researchers and Analysts: To glean insights into labour market trends in the data science domain. * Human Resources Professionals: To benchmark job roles, skill requirements, and compensation within the industry.

Dataset Name Suggestions

Indeed US Data Science Job Insights

US Data Science Job Market Analysis

Data Professional Job Postings (Indeed USA)

Data Science Career Opportunities (USA)

Attributes

Original Data Source: Data Science Job Postings (Indeed USA)
A
‘US Health Insurance Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘US Health Insurance Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-health-insurance-dataset-8b56/068994aa/?iid=012-655&v=presentation
Explore at:
Dataset updated
Nov 15, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘US Health Insurance Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/teertha/ushealthinsurancedataset on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Context

The venerable insurance industry is no stranger to data driven decision making. Yet in today's rapidly transforming digital landscape, Insurance is struggling to adapt and benefit from new technologies compared to other industries, even within the BFSI sphere (compared to the Banking sector for example.) Extremely complex underwriting rule-sets that are radically different in different product lines, many non-KYC environments with a lack of centralized customer information base, complex relationship with consumers in traditional risk underwriting where sometimes customer centricity runs reverse to business profit, inertia of regulatory compliance - are some of the unique challenges faced by Insurance Business.

Despite this, emergent technologies like AI and Block Chain have brought a radical change in Insurance, and Data Analytics sits at the core of this transformation. We can identify 4 key factors behind the emergence of Analytics as a crucial part of InsurTech:

Big Data: The explosion of unstructured data in the form of images, videos, text, emails, social media

AI: The recent advances in Machine Learning and Deep Learning that can enable businesses to gain insight, do predictive analytics and build cost and time - efficient innovative solutions

Real time Processing: Ability of real time information processing through various data feeds (for ex. social media, news)

Increased Computing Power: a complex ecosystem of new analytics vendors and solutions that enable carriers to combine data sources, external insights, and advanced modeling techniques in order to glean insights that were not possible before.

This dataset can be helpful in a simple yet illuminating study in understanding the risk underwriting in Health Insurance, the interplay of various attributes of the insured and see how they affect the insurance premium.

Content

This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: Age, Sex, BMI, Number of Children, Smoker and Region. There are no missing or undefined values in the dataset.

Inspiration

This relatively simple dataset should be an excellent starting point for EDA, Statistical Analysis and Hypothesis testing and training Linear Regression models for predicting Insurance Premium Charges.

Proposed Tasks: - Exploratory Data Analytics - Statistical hypothesis testing - Statistical Modeling - Linear Regression

--- Original source retains full ownership of the source dataset ---
f
EDA augmentation parameters.
plos.figshare.com
xls
Updated Sep 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). EDA augmentation parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310707.t009
Dataset updated
Sep 26, 2024
Dataset provided by
PLOS ONE
Authors
Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.
I
Industrial Production Statistical Analysis Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Industrial Production Statistical Analysis Software Report [Dataset]. https://www.datainsightsmarket.com/reports/industrial-production-statistical-analysis-software-504068
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 7, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global market for Industrial Production Statistical Analysis Software is experiencing robust growth, projected at a Compound Annual Growth Rate (CAGR) of 5.2% from 2025 to 2033. In 2025, the market size reached $3748 million. This expansion is fueled by several key factors. Firstly, the increasing adoption of Industry 4.0 and digital transformation initiatives across manufacturing sectors is driving demand for sophisticated data analytics solutions. Businesses are increasingly reliant on data-driven decision-making to optimize production processes, improve efficiency, and enhance product quality. Secondly, the growing complexity of industrial processes necessitates advanced software capable of handling large datasets and providing actionable insights. This includes real-time monitoring, predictive maintenance, and quality control applications. The software’s ability to identify patterns and anomalies crucial to preventing production bottlenecks and maximizing output contributes significantly to its appeal. Finally, stringent regulatory compliance requirements and a growing focus on sustainability are further pushing adoption. Companies need robust data analysis tools to comply with environmental standards and track their carbon footprint. Segmentation reveals a diverse market landscape. The application segment is dominated by architecture, mechanical engineering, and the automotive industry, each leveraging the software for unique purposes such as design optimization, simulation, and performance analysis. Within types, 3D modeling and analysis software are gaining traction due to their ability to represent complex geometries and improve design accuracy. The geographical distribution shows a strong presence in North America and Europe, driven by technological advancements and robust manufacturing industries in these regions. However, the Asia-Pacific region is expected to witness significant growth in the coming years, fuelled by rapid industrialization and rising technological adoption in countries like China and India. Leading players such as Autodesk, Siemens EDA, and Dassault Systèmes are actively shaping the market through technological innovation and strategic partnerships. The forecast period, 2025-2033, promises continued market growth driven by these factors and the wider adoption of advanced data analytics in industrial production.

Facebook

Twitter

Click to copy link

Link copied

Cite

Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369

Exploratory Data Analysis (EDA) Tools Report

Explore at:

doc, ppt, pdfAvailable download formats

Dataset updated

Apr 2, 2025

Dataset authored and provided by

Market Report Analytics

License

https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing volume and complexity of data across industries. The rising need for data-driven decision-making, coupled with the expanding adoption of cloud-based analytics solutions, is fueling market expansion. While precise figures for market size and CAGR are not provided, a reasonable estimation, based on the prevalent growth in the broader analytics market and the crucial role of EDA in the data science workflow, would place the 2025 market size at approximately $3 billion, with a projected Compound Annual Growth Rate (CAGR) of 15% through 2033. This growth is segmented across various applications, with large enterprises leading the adoption due to their higher investment capacity and complex data needs. However, SMEs are witnessing rapid growth in EDA tool adoption, driven by the increasing availability of user-friendly and cost-effective solutions. Further segmentation by tool type reveals a strong preference for graphical EDA tools, which offer intuitive visualizations facilitating better data understanding and communication of findings. Geographic regions, such as North America and Europe, currently hold a significant market share, but the Asia-Pacific region shows promising potential for future growth owing to increasing digitalization and data generation. Key restraints to market growth include the need for specialized skills to effectively utilize these tools and the potential for data bias if not handled appropriately. The competitive landscape is dynamic, with both established players like IBM and emerging companies specializing in niche areas vying for market share. Established players benefit from brand recognition and comprehensive enterprise solutions, while specialized vendors provide innovative features and agile development cycles. Open-source options like KNIME and R packages (Rattle, Pandas Profiling) offer cost-effective alternatives, particularly attracting academic institutions and smaller businesses. The ongoing development of advanced analytics functionalities, such as automated machine learning integration within EDA platforms, will be a significant driver of future market growth. Further, the integration of EDA tools within broader data science platforms is streamlining the overall analytical workflow, contributing to increased adoption and reduced complexity. The market's evolution hinges on enhanced user experience, more robust automation features, and seamless integration with other data management and analytics tools.

Clear search

Close search

Google apps

Main menu

Exploratory Data Analysis (EDA) Tools Report

DataSheet1_Exploratory data analysis (EDA) machine learning approaches for...

Exploratory Data Analysis (EDA) Tools Report

Global Exploratory Data Analysis (EDA) Tools Market Revenue Forecasts...

house prices data exploration

Dataset

Contents

Orange dataset table

Data: Anscombe's quintet

Credit EDA Case Study Data

Dataset

Contents

EDA Tools Market Report

Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...

Data from: FactExplorer: Fact Embedding-Based Exploratory Data Analysis for...

Data from: Drastic changes before the 2011 Tohoku earthquake, revealed by...

Chicago Air Quality Analysis

Background:

Standands:

sohamphanseiitb/BIG_Data_5MSEC: BIG Data Analysis of NASA's 5 Millennium...

PIA Customer Feedback Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Exploratory Data Analysis | EDA - use case

Dataset

Contents

Data Science Career Opportunities (USA)

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

‘US Health Insurance Dataset’ analyzed by Analyst-2

Context

Content

Inspiration

EDA augmentation parameters.

Industrial Production Statistical Analysis Software Report

Exploratory Data Analysis (EDA) Tools Report