100+ datasets found

Stock Price EDA(Time Series Analysis)
kaggle.com
zip
Updated May 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RITIK MAHESHWARI (2021). Stock Price EDA(Time Series Analysis) [Dataset]. https://www.kaggle.com/ritikmaheshwari/stock-price-edatime-series-analysis
Explore at:
zip(11875814 bytes)Available download formats
Dataset updated
May 4, 2021
Authors
RITIK MAHESHWARI
Description
Context

There's a story behind every dataset and here's your opportunity to share yours.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Analyze closing price of all the stocks. Analyze the total volume of stocks being traded each day. Analyze daily price change in stock. Analyze monthly mean of close feature. Analyze whether Stock prices of these tech companies are correlated or not. Analyze Daily return of each stock and how they are co-related. Value at risk analysis for Tech Companies.
Breast Cancer Exploratory Data Analysis EDA
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). Breast Cancer Exploratory Data Analysis EDA [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/breast-cancer-exploratory-data-analysis-eda
Explore at:
zip(7609364 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains clinical and diagnostic features related to Breast Cancer, designed for comprehensive Exploratory Data Analysis (EDA) and subsequent predictive modeling.

It is derived from digitized images of Fine Needle Aspirates (FNA) of breast masses.

The dataset features quantitative measurements, typically calculated from the characteristics of cell nuclei, including: - Radius - Texture - Perimeter - Area - Smoothness - Compactness - Concavity - Concave Points - Symmetry - Fractal Dimension

These features are provided as mean, standard error, and "worst" (largest) values.

The primary goal of this resource is to support the validation of EDA techniques necessary for clinical data science: - Data quality assessment (missing values, inconsistencies). - Feature assessment (distributions, correlations). - Visualization for diagnostic modeling.

The primary target variable is the binary classification of the tissue sample: Malignant vs. Benign.

Data from: Supplementary Material for "Sonification for Exploratory Data...

pub.uni-bielefeld.de
search.datacite.org

Updated Feb 5, 2019

Facebook

Twitter

Click to copy link

Link copied

Cite

Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. https://pub.uni-bielefeld.de/record/2920448

Explore at:

Dataset updated

Feb 5, 2019

Authors

Thomas Hermann

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Sonification for Exploratory Data Analysis

Chapter 8: Sonification Models

In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data.

8.1 Data Sonograms

Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space.

Table 8.2, page 87: Sound examples for Data Sonograms

File:	Iris dataset: started in plot "https://pub.uni-bielefeld.de/download/2920448/2920454">(a) at S0 (b) at S1 (c) at S2 10d noisy circle dataset: started in plot (c) at "https://pub.uni-bielefeld.de/download/2920448/2920451">S0 (mean) (d) at S1 (edge) 10d Gaussian: plot (d) started at S0 3 clusters: Example 1 3 clusters: invisible columns used as output variables: "https://pub.uni-bielefeld.de/download/2920448/2920450">Example 2
Description:	Data Sonogram Sound examples for synthetic datasets and the Iris dataset
Duration:	about 5 s

8.2 Particle Trajectory Sonification Model

This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset.

Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x).
Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters.
Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters.
Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster.
Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters
Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster
Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step.
Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step.
Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset.

8.3 Markov chain Monte Carlo Sonification

The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound.

Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes.
Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset
McMC Sonification for Cluster Analysis, dataset with three clusters, page 107
- Stream 1 MCMC-Ex-3.1
- Stream 2 MCMC-Ex-3.2
- Stream 3 MCMC-Ex-3.3
- Mix MCMC-Ex-3.4
McMC Sonification for Cluster

f
Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...
acs.figshare.com
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford (2023). The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE) [Dataset]. http://doi.org/10.1021/acs.jcim.1c00244.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.1c00244.s002
Dataset updated
Jun 8, 2023
Dataset provided by
ACS Publications
Authors
Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
e
Exploratory Data Analytics and Descriptive Statistics
paper.erudition.co.in
html
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Exploratory Data Analytics and Descriptive Statistics [Dataset]. https://paper.erudition.co.in/makaut/bachelor-in-business-administration-2020-2021/5/data-analytics-skills-for-managers
Explore at:
htmlAvailable download formats
Dataset updated
Dec 3, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Exploratory Data Analytics and Descriptive Statistics of Data Analytics Skills for Managers, 5th Semester , Bachelor in Business Administration 2020 - 2021
EDA on Cleaned Netflix Data
kaggle.com
zip
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikhil raman K (2025). EDA on Cleaned Netflix Data [Dataset]. https://www.kaggle.com/datasets/nikhilramank/eda-on-cleaned-netflix-data
Explore at:
zip(110806 bytes)Available download formats
Dataset updated
Jul 7, 2025
Authors
Nikhil raman K
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a cleaned version of a Netflix movies dataset originally used for exploratory data analysis (EDA). The dataset contains information such as:

Title

Release Year

Rating

Genre

Votes

Description

Stars

Missing values have been handled using appropriate methods (mean, median, unknown), and new features like rating_level and popular have been added for deeper analysis.

The dataset is ready for: - EDA - Data visualization - Machine learning tasks - Dashboard building

Used in the accompanying notebook
Online Retail Business
kaggle.com
zip
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umerkk12 (2021). Online Retail Business [Dataset]. https://www.kaggle.com/umerkk12/online-retail-business
Explore at:
zip(7571514 bytes)Available download formats
Dataset updated
Jan 20, 2021
Authors
Umerkk12
Description
Context

This is a dataset for online orders placed at a retail business. The rows represent the transactions of every order being made. Our job is to find out how we can dive deep into this data set to bring out meaning for the retail business to make strategic business decisions. The dataset has 6000 rows.

***Invoice No:** The unique number assigned to this particular row/transaction StockCode: The code of the item purchased Description: The description of the item purchased Quantity: The quantity of the item purchased InvoiceDate: The Date on which the item was purchased UnitPrice: The price at which the item was purchased CustomerID: The ID of the customer which has made this transaction Country: The country in which this transactio took place*
n
Data from: Research and exploratory analysis driven - time-data...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko (2022). Research and exploratory analysis driven - time-data visualization (read-tv) software [Dataset]. http://doi.org/10.5061/dryad.d51c5b02g
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.d51c5b02g
Dataset updated
Jan 30, 2022
Dataset provided by
Medical University of South Carolina
Authors
John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
read-tv

The main paper is about, read-tv, open-source software for longitudinal data visualization. We uploaded sample use case surgical flow disruption data to highlight read-tv's capabilities. We scrubbed the data of protected health information, and uploaded it as a single CSV file. A description of the original data is described below.

Data source

Surgical workflow disruptions, defined as “deviations from the natural progression of an operation thereby potentially compromising the efficiency or safety of care”, provide a window on the systems of work through which it is possible to analyze mismatches between the work demands and the ability of the people to deliver the work. They have been shown to be sensitive to different intraoperative technologies, surgical errors, surgical experience, room layout, checklist implementation and the effectiveness of the supporting team. The significance of flow disruptions lies in their ability to provide a hitherto unavailable perspective on the quality and efficiency of the system. This allows for a systematic, quantitative and replicable assessment of risks in surgical systems, evaluation of interventions to address them, and assessment of the role that technology plays in exacerbation or mitigation.

In 2014, Drs Catchpole and Anger were awarded NIBIB R03 EB017447 to investigate flow disruptions in Robotic Surgery which has resulted in the detailed, multi-level analysis of over 4,000 flow disruptions. Direct observation of 89 RAS (robitic assisted surgery) cases, found a mean of 9.62 flow disruptions per hour, which varies across different surgical phases, predominantly caused by coordination, communication, equipment, and training problems.

Methods This section does not describe the methods of read-tv software development, which can be found in the associated manuscript from JAMIA Open (JAMIO-2020-0121.R1). This section describes the methods involved in the surgical work flow disruption data collection. A curated, PHI-free (protected health information) version of this dataset was used as a use case for this manuscript.

Observer training

Trained human factors researchers conducted each observation following the completion of observer training. The researchers were two full-time research assistants based in the department of surgery at site 3 who visited the other two sites to collect data. Human Factors experts guided and trained each observer in the identification and standardized collection of FDs. The observers were also trained in the basic components of robotic surgery in order to be able to tangibly isolate and describe such disruptive events.

Comprehensive observer training was ensured with both classroom and floor training. Observers were required to review relevant literature, understand general practice guidelines for observing in the OR (e.g., where to stand, what to avoid, who to speak to), and conduct practice observations. The practice observations were broken down into three phases, all performed under the direct supervision of an experienced observer. During phase one, the trainees oriented themselves to the real-time events of both the OR and the general steps in RAS. The trainee was also introduced to the OR staff and any other involved key personnel. During phase two, the trainer and trainee observed three RAS procedures together to practice collecting FDs and become familiar with the data collection tool. Phase three was dedicated to determining inter-rater reliability by having the trainer and trainee simultaneously, yet independently, conduct observations for at least three full RAS procedures. Observers were considered fully trained if, after three full case observations, intra-class correlation coefficients (based on number of observed disruptions per phase) were greater than 0.80, indicating good reliability.

Data collection

Following the completion of training, observers individually conducted observations in the OR. All relevant RAS cases were pre-identified on a monthly basis by scanning the surgical schedule and recording a list of procedures. All procedures observed were conducted with the Da Vinci Xi surgical robot, with the exception of one procedure at Site 2, which was performed with the Si robot. Observers attended those cases that fit within their allotted work hours and schedule. Observers used Microsoft Surface Pro tablets configured with a customized data collection tool developed using Microsoft Excel to collect data. The data collection tool divided procedures into five phases, as opposed to the four phases previously used in similar research, to more clearly distinguish between task demands throughout the procedure. Phases consisted of phase 1 - patient in the room to insufflation, phase 2 -insufflation to surgeon on console (including docking), phase 3 - surgeon on console to surgeon off console, phase 4 - surgeon off console to patient closure, and phase 5 - patient closure to patient leaves the operating room. During each procedure, FDs were recorded into the appropriate phase, and a narrative, time-stamp, and classification (based off of a robot-specific FD taxonomy) were also recorded.

Each FD was categorized into one of ten categories: communication, coordination, environment, equipment, external factors, other, patient factors, surgical task considerations, training, or unsure. The categorization system is modeled after previous studies, as well as the examples provided for each FD category.

Once in the OR, observers remained as unobtrusive as possible. They stood at an appropriate vantage point in the room without getting in the way of team members. Once an appropriate time presented itself, observers introduced themselves to the circulating nurse and informed them of the reason for their presence. Observers did not directly engage in conversations with operating room staff, however, if a staff member approached them with any questions/comments they would respond.

Data Reduction and PHI (Protected Health Information) Removal

This dataset uses 41 of the aforementioned surgeries. All columns have been removed except disruption type, a numeric timestamp for number of minutes into the day, and surgical phase. In addition, each surgical case had it's initial disruption set to 12 noon, (720 minutes).
Data from: Exploratory data inference for detecting mastitis in dairy cattle...
scielo.figshare.com
png
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodes Angelo Batista da Silva; Héliton Pandorfi; Gledson Luiz Pontes de Almeida; Marcos Vinícius da Silva (2023). Exploratory data inference for detecting mastitis in dairy cattle [Dataset]. http://doi.org/10.6084/m9.figshare.11804628.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11804628.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Rodes Angelo Batista da Silva; Héliton Pandorfi; Gledson Luiz Pontes de Almeida; Marcos Vinícius da Silva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT. The aim of this study was to employ the principal component technique to physiological data and environmental thermohygrometric variables correlated with detection of clinical and subclinical mastitis in dairy cattle. A total of 24 lactating Girolando cows with different clinical conditions were selected (healthy, and with clinical or subclinical mastitis). The following physiological variables were recorded: udder surface temperature, ST (°C); eyeball temperature, ET (°C); rectum temperature, RT (°C); respiratory frequency, RF (mov. min-1). Thermohygrometric variables included air temperature, AirT (°C), and relative humidity, RU (%). ST was determined by means of thermal images, with four images per animal, on these quarters: front left side (FL), front right side (FR), rear right side (RR) and rear left side (RL), totaling 96 images. Exploratory data analysis was run through multivariate statistical technique with the employment of principal components, comprehending nine variables: ST on the FL, FR, RL and RR quarters; ET, RT; RF, AirT and RU. The representative quarters of the animals with clinical and subclinical mastitis showed udder temperatures 8.55 and 2.46° C higher than those of healthy animals, respectively. The ETs of the animals with subclinical and clinical mastitis were, respectively, 7.9 and 8.0% higher than those of healthy animals. Rectum temperatures were 2.9% (subclinical mastitis) and 5.5% (clinical mastitis) higher compared to those of healthy animals. Respiratory frequencies were 40.3% (subclinical mastitis) and 61.6% (clinical mastitis) higher compared to those of healthy animals. The first component explained 91% of the total variance for the variables analyzed. The principal component technique allowed verifying the variables correlated with the animals' clinical condition and the degree of dependence between the study variables.
f
Data from: Data Nuggets: A Method for Reducing Big Data While Preserving...
tandf.figshare.com
tar
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure [Dataset]. http://doi.org/10.6084/m9.figshare.25594361.v1
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25594361.v1
Dataset updated
Jun 11, 2024
Dataset provided by
Taylor & Francis
Authors
Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Big data, with N × P dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order P*N(N−1)2. To circumvent this problem, typically the clustering technique is applied to a random sample drawn from the dataset; however, a weakness is that the structure of the dataset, particularly at the edges, is not necessarily maintained. We propose a new solution through the concept of “data nuggets”, which reduces a large dataset into a small collection of nuggets of data, each containing a center, weight, and scale parameter. The data nuggets are then input into algorithms that compute methods such as principal components analysis and clustering in a more computationally efficient manner. We show the consistency of the data nuggets based covariance estimator and apply the methodology of data nuggets to perform exploratory analysis of a flow cytometry dataset containing over one million observations using PCA and K-means clustering for weighted observations. Supplementary materials for this article are available online.
Data from: Exploratory Research on the Impact of the Growing Oil Industry in...
catalog.data.gov
datasets.ai
+1more
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Exploratory Research on the Impact of the Growing Oil Industry in North Dakota and Montana on Domestic Violence, Dating Violence, Sexual Assault, and Stalking, 2000-2015 [Dataset]. https://catalog.data.gov/dataset/exploratory-research-on-the-impact-of-the-growing-oil-industry-in-north-dakota-and-mo-2000-2477d
Explore at:
Dataset updated
Nov 14, 2025
Dataset provided by
National Institute of Justicehttp://nij.ojp.gov/
Area covered
North Dakota
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study used secondary analysis of data from several different sources to examine the impact of increased oil development on domestic violence, dating violence, sexual assault, and stalking (DVDVSAS) in the Bakken region of Montana and North Dakota. Distributed here are the code used for the secondary analysis data; the data are not available through other public means. Please refer to the User Guide distributed with this study for a list of instructions on how to obtain all other data used in this study. This collection contains a secondary analysis of the Uniform Crime Reports (UCR). UCR data serve as periodic nationwide assessments of reported crimes not available elsewhere in the criminal justice system. Each year, participating law enforcement agencies contribute reports to the FBI either directly or through their state reporting programs. Distributed here are the codes used to create the datasets and preform the secondary analysis. Please refer to the User Guide, distributed with this study, for more information. This collection contains a secondary analysis of the National Incident Based Reporting System (NIBRS), a component part of the Uniform Crime Reporting Program (UCR) and an incident-based reporting system for crimes known to the police. For each crime incident coming to the attention of law enforcement, a variety of data were collected about the incident. These data included the nature and types of specific offenses in the incident, characteristics of the victim(s) and offender(s), types and value of property stolen and recovered, and characteristics of persons arrested in connection with a crime incident. NIBRS collects data on each single incident and arrest within 22 offense categories, made up of 46 specific crimes called Group A offenses. In addition, there are 11 Group B offense categories for which only arrest data were reported. NIBRS data on different aspects of crime incidents such as offenses, victims, offenders, arrestees, etc., can be examined as different units of analysis. Distributed here are the codes used to create the datasets and preform the secondary analysis. Please refer to the User Guide, distributed with this study, for more information. The collection includes 17 SPSS syntax files. Qualitative data collected for this study are not available as part of the data collection at this time.
f
Descriptive statistics.
plos.figshare.com
xls
Updated Oct 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgph.0002475.t003
Dataset updated
Oct 31, 2023
Dataset provided by
PLOS Global Public Health
Authors
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Data from: Exploratory investigation of historical decorative laminates by...
data.europa.eu
zenodo.org
unknown
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2023). Exploratory investigation of historical decorative laminates by means of vibrational spectroscopic techniques [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7862015?locale=en
Explore at:
unknown(72141)Available download formats
Dataset updated
Apr 24, 2023
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the data used for the publication entitled "Exploratory investigation of historical decorative laminates by means of vibrational spectroscopic techniques".
Data from: Verification of the stationarity of flow series in the Iguaçu...
scielo.figshare.com
jpeg
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruna Kiechaloski Miró Tozzi; Heinz Dieter Oskar August Fill (2023). Verification of the stationarity of flow series in the Iguaçu River basin [Dataset]. http://doi.org/10.6084/m9.figshare.12210524.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12210524.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Bruna Kiechaloski Miró Tozzi; Heinz Dieter Oskar August Fill
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The purpose of this technical note is to verify the stationarity of flows in the Iguaçu River Basin, considering 14 fluviometric stations. For this purpose, three sets of annual flow series were studied: mean flows, maximum flows and minimum flows of 7 days. Initially, the exploratory analysis of the data was performed, based on the establishment of two points of change of the characteristics of the flows and the accomplishment of statistical tests of equality of mean and variance, parametric and nonparametric. Finally, composite tests were used considering the basin divided into Upper and Lower Iguaçu. In the exploratory analysis of the data, it was concluded that there was apparently a change in the trend of flow rates, part in the 1970s, part in the 1980s. Therefore, the series were divided in two different ways, at the half and at the point suggested in the exploratory analysis. Regarding the statistical tests, the Mann-Whitney test was chosen because it did not depend on the underlying distribution of the flow series, and because the test was highly recommended by other authors. It was concluded that the flow change occurred in a relatively short interval of time, and could be treated as a non-stationarity per hop. In general, the most recent change was downstream of the Uniao da Vitória fluviometric station. A change in the behavior of the mean and maximum flows was evident, however, such phenomenon was not observed in the evaluation of the minimum flows.
f
Data_Sheet_1_Functional Segregation of Human Brain Networks Across the...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Dec 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosenberg, Benjamin M.; Kaiser, Roselinde H.; Monti, Martin M.; Mennigen, Eva (2020). Data_Sheet_1_Functional Segregation of Human Brain Networks Across the Lifespan: An Exploratory Analysis of Static and Dynamic Resting-State Functional Connectivity.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000542386
Explore at:
Dataset updated
Dec 8, 2020
Authors
Rosenberg, Benjamin M.; Kaiser, Roselinde H.; Monti, Martin M.; Mennigen, Eva
Description
Prior research has shown that during development, there is increased segregation between, and increased integration within, prototypical resting-state functional brain networks. Functional networks are typically defined by static functional connectivity over extended periods of rest. However, little is known about how time-varying properties of functional networks change with age. Likewise, a comparison of standard approaches to functional connectivity may provide a nuanced view of how network integration and segregation are reflected across the lifespan. Therefore, this exploratory study evaluated common approaches to static and dynamic functional network connectivity in a publicly available dataset of subjects ranging from 8 to 75 years of age. Analyses evaluated relationships between age and static resting-state functional connectivity, variability (standard deviation) of connectivity, and mean dwell time of functional network states defined by recurring patterns of whole-brain connectivity. Results showed that older age was associated with decreased static connectivity between nodes of different canonical networks, particularly between the visual system and nodes in other networks. Age was not significantly related to variability of connectivity. Mean dwell time of a network state reflecting high connectivity between visual regions decreased with age, but older age was also associated with increased mean dwell time of a network state reflecting high connectivity within and between canonical sensorimotor and visual networks. Results support a model of increased network segregation over the lifespan and also highlight potential pathways of top-down regulation among networks.
UKCP18 exploratory extended time-mean sea level projections around the UK...
catalogue.ceda.ac.uk
Updated Mar 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Met Office Hadley Centre (MOHC) (2023). UKCP18 exploratory extended time-mean sea level projections around the UK for 2007-2300 [Dataset]. https://catalogue.ceda.ac.uk/uuid/a077f4058cda4cd4b37ccfbdf1a6bd29
Explore at:
Dataset updated
Mar 14, 2023
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Met Office Hadley Centre (MOHC)
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Time period covered
Jan 1, 2007 - Dec 30, 2300
Area covered

Variables measured
Time, time, latitude, longitude, percentile, Local time-mean relative sea level anomaly
Description
The UKCP18 exploratory extended time-mean sea level projections are provided as spatially a continuous dataset around the UK coastline for the period 2007-2300. These exploratory projections have been devised to be used seamlessly with the UKCP18 21st Century projections and provide very similar values for the period up to 2100. Users should be aware that post-2100 projections have a far greater degree of uncertainty than the 21st Century projections and should therefore be treated as illustrative of the potential future changes. Note that we cannot rule out substantially larger sea level rise in the coming centuries than is represented in the projections presented here. The data consist of annual time series of the projected change in the time-mean coastal water level relative to the average value for the period 1981-2000. Projections are available for the RCP2.6, RCP4.5 and RCP8.5 climate change scenarios (Meinshausen et al, 2011). As with the 21st Century projections, nine percentiles are provided to characterise the projection uncertainty, based on underlying modelling uncertainty. However, users should view these uncertainties with a much lower degree of confidence for the period post-2100.

This dataset was updated in March 2023 to correct a minor processing error in the earlier version of the UKCP18 site-specific sea level projections relating to the adjustment applied to convert from the IPCC AR5 baseline of 1986-2005 to the baseline period of 1981-2000. The update results in about a 1 cm increase compared to the original data release for all UKCP18 site-specific sea level projections at all timescales. Further details can be found in the accompanying technical note.
m
Exploratory Extended Time-mean Sea Level Projections to 2300 (cm)
climatedataportal.metoffice.gov.uk
ai-climate-hackathon-global-community.hub.arcgis.com
+1more
Updated Apr 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Met Office (2022). Exploratory Extended Time-mean Sea Level Projections to 2300 (cm) [Dataset]. https://climatedataportal.metoffice.gov.uk/datasets/exploratory-extended-time-mean-sea-level-projections-to-2300-cm
Explore at:
Dataset updated
Apr 12, 2022
Dataset authored and provided by
Met Office
Area covered

Description
Please note this dataset supersedes previous versions on the Climate Data Portal. It has been uploaded following an update to the dataset in March 2023. This means sea level rise is approximately 1cm higher (larger) compared to the original data release (i.e. the previous version available on this portal) for all UKCP18 site-specific sea level projections at all timescales. For more details please refer to the technical note.What does the data show?The exploratory extended time-mean sea-level projections to 2300 show the amount of sea-level change (in cm) for each coastal location (grid-box) around the British Isles for several emission scenarios. Sea-level rise is the primary mechanism by which we expect coastal flood risk to change in the UK in the future. The amount of sea-level rise depends on the location around the British Isles and increases with higher emission scenarios. Here, we provide the relative time-mean sea-level projections to 2300, i.e. the local sea-level change experienced at a particular location compared to the 1981-2000 average, produced as part of UKCP18.For each grid box the time-mean sea-level change projections are provided for the end of each decade (e.g. 2010, 2020, 2030 etc) for three emission scenarios known as Representative Concentration Pathways (RCP) and for three percentiles.The emission scenarios are:RCP2.6RCP4.5RCP8.5The percentiles are:5th percentile50th percentile95th percentileImportant limitations of the dataWe cannot rule out substantial additional sea-level rise associated with ice sheet instability processes that are not represented in the UKCP18 projections, as discussed in the recent IPCC Sixth Assessment Report (AR6). These exploratory projections show sea levels continue to increase beyond 2100 even with large reductions in greenhouse gas emissions. It should be noted that these projections have a greater degree of uncertainty than the 21st Century Projections and should therefore be treated as illustrative of the potential future changes. They are designed to be used alongside the 21st Century projections for those interested in exploring post-2100 changes.What are the naming conventions and how do I explore the data?The data is supplied so that each row corresponds to the combination of a RCP emissions scenario and percentile value e.g. 'RCP45_50' is the RCP4.5 scenario and the 50th percentile. This can be viewed and filtered by the field 'RCP and Percentile'. The columns (fields) correspond to the end of each decade and the fields are named by sea level anomaly at year x, e.g. '2050 seaLevelAnom' is the sea level anomaly at 2050 compared to the 1981-2000 average.Please note that the styling and filtering options are independent of each other and the attribute you wish to style the data by can be set differently to the one you filter by. Please ensure that you have selected the RCP/percentile and decade you want to both filter and style the data by. Select the cell you are interested in to view all values.To understand how to explore the data please refer to the New Users ESRI Storymap.What are the emission scenarios?The 21st Century time-mean sea level projections were produced using some of the future emission scenarios used in the IPCC Fifth Assessment Report (AR5). These are RCP2.6, RCP4.5 and RCP8.5, which are based on the concentration of greenhouse gases and aerosols in the atmosphere. RCP2.6 is an aggressive mitigation pathway, where greenhouse gas emissions are strongly reduced. RCP4.5 is an intermediate ‘stabilisation’ pathway, where greenhouse gas emissions are reduced by varying levels. RCP8.5 is a high emission pathway, where greenhouse gas emissions continue to grow unmitigated. Further information is available in the Understanding Climate Data ESRI Storymap and the RCP Guidance on the UKCP18 website.What are the percentiles?The UKCP18 sea-level projections are based on a large Monte Carlo simulation that represents 450,000 possible outcomes in terms of global mean sea-level change. The Monte Carlo simulation is designed to sample the uncertainties across the different components of sea-level rise, and the amount of warming we see for a given emissions scenario across CMIP5 climate models. The percentiles are used to characterise the uncertainty in the Monte Carlo projections based on the statistical distribution of the 450,000 individual simulation members. For example, the 50th percentile represents the central estimate (median) amongst the model projections. Whilst the 95th percentile value means 95% of the model distribution is below that value and similarly the 5th percentile value means 5% of the model distribution is below that value. The range between the 5th to 95th percentiles represent the projection range amongst models and corresponds to the IPCC AR5 “likely range”. It should be noted that, there may be a greater than 10% chance that the real-world sea level rise lies outside this range.Data sourceThis data is an extract of a larger dataset (every year and more percentiles) which is available on CEDA at https://catalogue.ceda.ac.uk/uuid/a077f4058cda4cd4b37ccfbdf1a6bd29Data has been extracted from the v20221219 version (downloaded 17/04/2023) of three files:seaLevelAnom_marine-sim_rcp26_ann_2007-2300.ncseaLevelAnom_marine-sim_rcp45_ann_2007-2300.ncseaLevelAnom_marine-sim_rcp85_ann_2007-2300.ncUseful links to find out moreFor a comprehensive description of the underpinning science, evaluation and results see the UKCP18 Marine Projections Report (Palmer et al, 2018).For a discussion on ice sheet instability processes in the latest IPCC assessment report, see Fox-Kemper et al (2021). Technical note for the update to the underpinning data: https://www.metoffice.gov.uk/binaries/content/assets/metofficegovuk/pdf/research/ukcp/ukcp_tech_note_sea_level_mar23.pdf.Further information in the Met Office Climate Data Portal Understanding Climate Data ESRI Storymap.
f
Data from: Exploropleth: exploratory analysis of data binning methods in...
figshare.com
bin
Updated Sep 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arpit Narechania; Alex Endert; Clio Andris (2025). Exploropleth: exploratory analysis of data binning methods in choropleth maps [Dataset]. http://doi.org/10.6084/m9.figshare.30188129.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30188129.v1
Dataset updated
Sep 23, 2025
Dataset provided by
Taylor & Francis
Authors
Arpit Narechania; Alex Endert; Clio Andris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
When creating choropleth maps, mapmakers often bin (i.e. group, classify) quantitative data values into groups to help show that certain areas fall within a similar range of values. For instance, a mapmaker may divide counties into groups of high, middle, and low life expectancy (measured in years). It is well known that different binning methods (e.g. natural breaks, quantiles) yield different groupings, meaning the same data can be presented differently depending on how it is divided into bins. To help guide a wide variety of users, we present a new, open-source, web-based, geospatial visualization tool, Exploropleth, that lets users interact with a catalog of established data binning methods, and subsequently compare, customize, and export custom maps. This tool advances the state of the art by providing multiple binning methods in one view and supporting administrative unit reclassification on-the-fly. We interviewed 16 cartographers and geographic information systems (GIS) experts from 13 government organizations, non-government organizations (NGOs), and federal agencies who identified opportunities to integrate Exploropleth into their existing mapmaking workflow, and found that the tool has the potential to educate students as well as mapmakers with varying levels of experience. Exploropleth is open-source and publicly available at https://exploropleth.github.io.
Mean and standard deviation of SI by BMI group and maternal age.
figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson Borovac-Pinheiro; Filipe Moraes Ribeiro; Sirlei Siani Morais; Rodolfo Carvalho Pacagnella (2023). Mean and standard deviation of SI by BMI group and maternal age. [Dataset]. http://doi.org/10.1371/journal.pone.0217907.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0217907.t003
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Anderson Borovac-Pinheiro; Filipe Moraes Ribeiro; Sirlei Siani Morais; Rodolfo Carvalho Pacagnella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mean and standard deviation of SI by BMI group and maternal age.
Mean, standard deviation and percentiles of SI and HR within 2 hours after...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson Borovac-Pinheiro; Filipe Moraes Ribeiro; Sirlei Siani Morais; Rodolfo Carvalho Pacagnella (2023). Mean, standard deviation and percentiles of SI and HR within 2 hours after birth. [Dataset]. http://doi.org/10.1371/journal.pone.0217907.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0217907.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Anderson Borovac-Pinheiro; Filipe Moraes Ribeiro; Sirlei Siani Morais; Rodolfo Carvalho Pacagnella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mean, standard deviation and percentiles of SI and HR within 2 hours after birth.

Facebook

Twitter

Click to copy link

Link copied

Cite

RITIK MAHESHWARI (2021). Stock Price EDA(Time Series Analysis) [Dataset]. https://www.kaggle.com/ritikmaheshwari/stock-price-edatime-series-analysis

Stock Price EDA(Time Series Analysis)

Exploratory Data Analysis

Explore at:

zip(11875814 bytes)Available download formats

Dataset updated

May 4, 2021

Authors

RITIK MAHESHWARI

Description

Context

There's a story behind every dataset and here's your opportunity to share yours.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Analyze closing price of all the stocks. Analyze the total volume of stocks being traded each day. Analyze daily price change in stock. Analyze monthly mean of close feature. Analyze whether Stock prices of these tech companies are correlated or not. Analyze Daily return of each stock and how they are co-related. Value at risk analysis for Tech Companies.

Clear search

Close search

Google apps

Main menu

Stock Price EDA(Time Series Analysis)

Context

Content

Acknowledgements

Inspiration

Breast Cancer Exploratory Data Analysis EDA

Data from: Supplementary Material for "Sonification for Exploratory Data...

Sonification for Exploratory Data Analysis

Chapter 8: Sonification Models

8.1 Data Sonograms

8.2 Particle Trajectory Sonification Model

8.3 Markov chain Monte Carlo Sonification

Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...

Exploratory Data Analytics and Descriptive Statistics

EDA on Cleaned Netflix Data

Online Retail Business

Context

Data from: Research and exploratory analysis driven - time-data...

Data from: Exploratory data inference for detecting mastitis in dairy cattle...

Data from: Data Nuggets: A Method for Reducing Big Data While Preserving...

Data from: Exploratory Research on the Impact of the Growing Oil Industry in...

Descriptive statistics.

Data from: Exploratory investigation of historical decorative laminates by...

Data from: Verification of the stationarity of flow series in the Iguaçu...

Data_Sheet_1_Functional Segregation of Human Brain Networks Across the...

UKCP18 exploratory extended time-mean sea level projections around the UK...

Exploratory Extended Time-mean Sea Level Projections to 2300 (cm)

Data from: Exploropleth: exploratory analysis of data binning methods in...

Mean and standard deviation of SI by BMI group and maternal age.

Mean, standard deviation and percentiles of SI and HR within 2 hours after...

Stock Price EDA(Time Series Analysis)

Exploratory Data Analysis

Context

Content

Acknowledgements

Inspiration