57 datasets found

Walmart Stocks Data 2025
kaggle.com
zip
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehar Shan Ali (2025). Walmart Stocks Data 2025 [Dataset]. https://www.kaggle.com/meharshanali/walmart-stocks-data-2025
Explore at:
zip(467062 bytes)Available download formats
Dataset updated
Feb 23, 2025
Authors
Mehar Shan Ali
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.

📌 Dataset Overview

This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.

📊 Features Included in the Dataset

Date 📅 – The trading day recorded.

Open Price 🟢 – Price at market open.

High Price 🔼 – Highest price of the day.

Low Price 🔽 – Lowest price of the day.

Close Price 🔴 – Price at market close.

Adjusted Close Price 📉 – Closing price adjusted for splits & dividends.

Trading Volume 📈 – Total shares traded.

Dividends 💰 – Cash payments to shareholders.

Stock Splits 🔄 – Records stock split events.

🔍 Exploratory Data Analysis (EDA) Steps

This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:

1️⃣ Data Preprocessing & Cleaning

Load data using Pandas

Handle missing values (if any)

Check data types and format them properly

Convert date column into a datetime format

2️⃣ Descriptive Statistics & Summary

Calculate key statistical measures like mean, median, standard deviation, and interquartile range (IQR)

Identify stock price trends over time

Check data distribution and skewness

3️⃣ Data Visualizations

📉 Line Plot – Analyze trends in closing prices over time.

📦 Box Plot – Detect potential outliers in stock prices.

📊 Histogram – Understand the distribution of closing prices.

📈 Moving Averages – Use short-term and long-term moving averages to observe stock trends.

🔥 Correlation Heatmap – Find relationships between stock market indicators.

4️⃣ Time Series Analysis

Identify trends and seasonality in the stock price data.

Calculate daily, weekly, and monthly returns.

Use rolling windows to analyze moving averages and volatility.

5️⃣ Insights & Conclusions

How volatile is Walmart’s stock over the given period?

Does the stock exhibit strong uptrends or downtrends?

Are there any strong correlations between features?

What insights can be drawn for investors and traders?

🚀 Use Cases & Applications

This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.

📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Human Activity Recognition Dataset
kaggle.com
zip
Updated Feb 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aruna S (2023). Human Activity Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/arunasivapragasam/human-activity-recognition-dataset
Explore at:
zip(51310476 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
Aruna S
Description
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% for the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time-domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into the body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body l linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern: '-XYZ' is used to denote 3-axial signals in the X, Y, and Z directions.

tBodyAcc-XYZ tGravityAcc-XYZ tBodyAccJerk-XYZ tBodyGyro-XYZ tBodyGyroJerk-XYZ tBodyAccMag tGravityAccMag tBodyAccJerkMag tBodyGyroMag tBodyGyroJerkMag fBodyAcc-XYZ fBodyAccJerk-XYZ fBodyGyro-XYZ fBodyAccMag fBodyAccJerkMag fBodyGyroMag fBodyGyroJerkMag

The set of variables that were estimated from these signals are:

mean(): Mean value std(): Standard deviation mad(): Median absolute deviation max(): Largest value in array min(): Smallest value in array sma(): Signal magnitude area energy(): Energy measure. Sum of the squares divided by the number of values. iqr(): Interquartile range entropy(): Signal entropy arCoeff(): Autorregresion coefficients with Burg order equal to 4 correlation(): correlation coefficient between two signals maxInds(): index of the frequency component with the largest magnitude meanFreq(): Weighted average of the frequency components to obtain a mean frequency skewness(): skewness of the frequency domain signal kurtosis(): kurtosis of the frequency domain signal bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window. angle(): Angle between two vectors.

Additional vectors are obtained by averaging the signals in a signal window sample. These are used on the angle() variable:

gravityMean tBodyAccMean tBodyAccJerkMean tBodyGyroMean tBodyGyroJerkMean

This data set consists of the following columns:

1 tBodyAcc-mean()-X 2 tBodyAcc-mean()-Y 3 tBodyAcc-mean()-Z 4 tBodyAcc-std()-X 5 tBodyAcc-std()-Y 6 tBodyAcc-std()-Z 7 tBodyAcc-mad()-X 8 tBodyAcc-mad()-Y 9 tBodyAcc-mad()-Z 10 tBodyAcc-max()-X 11 tBodyAcc-max()-Y 12 tBodyAcc-max()-Z 13 tBodyAcc-min()-X 14 tBodyAcc-min()-Y 15 tBodyAcc-min()-Z 16 tBodyAcc-sma() 17 tBodyAcc-energy()-X 18 tBodyAcc-energy()-Y 19 tBodyAcc-energy()-Z 20 tBodyAcc-iqr()-X 21 tBodyAcc-iqr()-Y 22 tBodyAcc-iqr()-Z 23 tBodyAcc-entropy()-X 24 tBodyAcc-entropy()-Y 25 tBodyAcc-entropy()-Z 26 tBodyAcc-arCoeff()-X,1 27 tBodyAcc-arCoeff()-X,2 28 tBodyAcc-arCoeff()-X,3 29 tBodyAcc-arCoeff()-X,4 30 tBodyAcc-arCoeff()-Y,1 31 tBodyAcc-arCoeff()-Y,2 32 tBodyAcc-arCoeff()-Y,3 33 tBodyAcc-arCoeff()-Y,4 34 tBodyAcc-arCoeff()-Z,1 35 tBodyAcc-arCoeff()-Z,2 36 tBodyAcc-arCoeff()-Z,3 37 tBodyAcc-arCoeff()-Z,4 38 tBodyAcc-correlation()-X,Y 39 tBodyAcc-correlation()-X,Z 40 tBodyAcc-correlation()-Y,Z 41 tGravityAcc-mean()-X 42 tGravit...
f
Sinus computed tomography findings in patients with COVID-19
datasetcatalog.nlm.nih.gov
scielo.figshare.com
Updated Mar 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel, Mauro Miguel; Gomes, Regina Lúcia Elia; Deps, Patrícia Duarte; Loureiro, Rafael Maffei; Sumi, Daniel Vaccaro; Collin, Simon Michael; Bezerra, Lorena Lima (2021). Sinus computed tomography findings in patients with COVID-19 [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000824107
Explore at:
Dataset updated
Mar 23, 2021
Authors
Daniel, Mauro Miguel; Gomes, Regina Lúcia Elia; Deps, Patrícia Duarte; Loureiro, Rafael Maffei; Sumi, Daniel Vaccaro; Collin, Simon Michael; Bezerra, Lorena Lima
Description
ABSTRACT Objective: To analyze computed tomography scans of paranasal sinuses of a series of patients with coronavirus disease 2019, and correlate the findings with the disease. Methods: Computed tomography scans of 95 adult patients who underwent a polymerase chain reaction test for severe acute respiratory syndrome coronavirus 2 were analyzed. Clinical data were obtained from patients’ records and telephone calls. Paranasal sinus opacification was graded and compared according to severe acute respiratory syndrome coronavirus 2 positivity. Results: Of the patients 28 (29.5%) tested positive for severe acute respiratory syndrome coronavirus 2 (median age 52 [range 26-95] years) and 67 were negative (median age 50 [range 18-95] years). Mucosal thickening was present in 97.4% of maxillary sinuses, 80% of anterior ethmoid air cells, 75.3% of posterior ethmoid air cells, 74.7% of frontal sinuses, and 66.3% of sphenoid sinuses. Minimal or mild mucosal thickening (score 1)and normally aerated sinuses (score 0) corresponded to 71.4% and 21.3% of all paranasal sinuses, respectively. The mean score of each paranasal sinus among severe acute respiratory syndrome coronavirus 2 positive and negative patients was 0.85±0.27 and 0.87±0.38, respectively (p=0.74). Median paranasal sinus opacification score among severe acute respiratory syndrome coronavirus 2 positive patients was 9 (interquartile range 8-10) compared to 9 (interquartile range 5-10) in negative patients (p=0.89). There was no difference in mean score adjusted for age and sex. Nasal congestion was more frequent in severe acute respiratory syndrome coronavirus 2 positive than negative patients (p=0.05). Conclusion: Severe acute respiratory syndrome coronavirus 2 infection was associated with patient recall of nasal congestion, but showed no correlation with opacification of paranasal sinuses.
United States Climate Reference Network (USCRN) Standardized Soil Moisture...
catalog.data.gov
s.cnmilf.com
+2more
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2023). United States Climate Reference Network (USCRN) Standardized Soil Moisture and Soil Moisture Climatology [Dataset]. https://catalog.data.gov/dataset/united-states-climate-reference-network-uscrn-standardized-soil-moisture-and-soil-moisture-clim2
Explore at:
Dataset updated
Sep 19, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
Area covered
United States
Description
The U.S. Climate Reference Network (USCRN) was designed to monitor the climate of the United States using research quality instrumentation located within representative pristine environments. This Standardized Soil Moisture (SSM) and Soil Moisture Climatology (SMC) product set is derived using the soil moisture observations from the USCRN. The hourly soil moisture anomaly (SMANOM) is derived by subtracting the MEDIAN from the soil moisture volumetric water content (SMVWC) and dividing the difference by the interquartile range (IQR = 75th percentile - 25th percentile) for that hour: SMANOM = (SMVWC - MEDIAN) / (IQR). The soil moisture percentile (SMPERC) is derived by taking all the values that were used to create the empirical cumulative distribution function (ECDF) that yielded the hourly MEDIAN and adding the current observation to the set, recalculating the ECDF, and determining the percentile value of the current observation. Finally, the soil temperature for the individual layers is provided for the dataset user convenience. The SMC files contain the MEAN, MEDIAN, IQR, and decimal fraction of available data that are valid for each hour of the year at 5, 10, 20, 50, and 100 cm depth soil layers as well as for a top soil layer (TOP) and column soil layer (COLUMN). The TOP layer consists of an average of the 5 and 10 cm depths, while the COLUMN layer includes all available depths at a location, either two layers or five layers depending on soil depth. The SSM files contain the mean VWC, SMANOM, SMPERC, and TEMPERATURE for each of the depth layers described above. File names are structured as CRNSSM0101-STATIONNAME.csv and CRNSMC0101-STATIONNAME.csv. SSM stands for Standardized Soil Moisture and SCM represent Soil Moisture Climatology. The first two digits of the trailing integer indicate major version and the second two digits minor version of the product.
Data from: S1 Dataset -
plos.figshare.com
xlsx
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukundo Siame; Gift C. Chama; Sepiso K. Masenga (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312570.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312570.s002
Dataset updated
Feb 12, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Lukundo Siame; Gift C. Chama; Sepiso K. Masenga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundTuberculosis (TB) remains a significant public health challenge, particularly among vulnerable populations like children. This is especially true in Sub-Saharan Africa, where the burden of TB in children is substantial. Zambia ranks 21st among the top 30 high TB endemic countries globally. While studies have explored TB in adults in Zambia, the prevalence and associated factors in children are not well documented. This study aimed to determine the prevalence and sociodemographic, and clinical factors associated with active TB disease in hospitalized children under the age of 15 years at Livingstone University Teaching Hospital (LUTH), the largest referral center in Zambia’s Southern Province.MethodsThis retrospective cross-sectional study of 700 pediatric patients under 15 years old, utilized programmatic data from the Pediatrics Department at LUTH. A systematic sampling method was used to select participants from medical records. Data on demographics, medical conditions, anthropometric measurements, and blood tests were collected. Data analysis included descriptive statistics, chi-square tests, and multivariable logistic regression to identify factors associated with TB.ResultsThe median age was 24 months (interquartile range (IQR): 11, 60) and majority were male (56.7%, n = 397/700). Most participants were from urban areas (59.9%, n = 419/700), and 9.2% (n = 62/675) were living with HIV. Malnutrition and comorbidities were present in a significant portion of the participants (19.0% and 25.1%, respectively). The prevalence of active TB cases was 9.4% (n = 66/700) among hospitalized children. Persons living with HIV (Adjusted odds ratio (AOR) of 6.30; 95% confidence interval (CI) of 2.85, 13.89, p< 0.001), and those who were malnourished (AOR: 10.38, 95% CI: 4.78, 22.55, p< 0.001) had a significantly higher likelihood of developing active TB disease.ConclusionThis study revealed a prevalence 9.4% active TB among hospitalized children under 15 years at LUTH. HIV status and malnutrition emerged as significant factors associated with active TB disease. These findings emphasize the need for pediatric TB control strategies that prioritize addressing associated factors to effectively reduce the burden of tuberculosis in Zambian children.
Gender, Age, and Emotion Detection from Voice
kaggle.com
zip
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/rohitzaman/gender-age-and-emotion-detection-from-voice
Explore at:
zip(967820 bytes)Available download formats
Dataset updated
May 29, 2021
Authors
Rohit Zaman
Description
Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...
plos.figshare.com
xls
Updated Jun 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann (2023). Descriptive statistics of the 2 datasets with mean, standard deviation (SD), median, the lower (quantile 2.5%) and upper (quantile 97.5%) boundary of the 95% confidence interval, and the interquartile range IQR (quartile 75%—quartile 25%). [Dataset]. http://doi.org/10.1371/journal.pone.0282213.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0282213.t001
Dataset updated
Jun 18, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
Data from: Urbanev: An open benchmark dataset for urban electric vehicle...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan (2025). Urbanev: An open benchmark dataset for urban electric vehicle charging demand prediction [Dataset]. http://doi.org/10.5061/dryad.np5hqc04z
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.np5hqc04z
Dataset updated
Apr 25, 2025
Dataset provided by
Sun Yat-sen University
Hong Kong Polytechnic University
Institute of High Performance Computing
Authors
Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The recent surge in electric vehicles (EVs), driven by a collective push to enhance global environmental sustainability, has underscored the significance of exploring EV charging prediction. To catalyze further research in this domain, we introduce UrbanEV—an open dataset showcasing EV charging space availability and electricity consumption in a pioneering city for vehicle electrification, namely Shenzhen, China. UrbanEV offers a rich repository of charging data (i.e., charging occupancy, duration, volume, and price) captured at hourly intervals across an extensive six-month span for over 20,000 individual charging stations. Beyond these core attributes, the dataset also encompasses diverse influencing factors like weather conditions and spatial proximity. These factors are thoroughly analyzed qualitatively and quantitatively to reveal their correlations and causal impacts on charging behaviors. Furthermore, comprehensive experiments have been conducted to showcase the predictive capabilities of various models, including statistical, deep learning, and transformer-based approaches, using the UrbanEV dataset. This dataset is poised to propel advancements in EV charging prediction and management, positioning itself as a benchmark resource within this burgeoning field. Methods To build a comprehensive and reliable benchmark dataset, we conduct a series of rigorous processes from data collection to dataset evaluation. The overall workflow sequentially includes data acquisition, data processing, statistical analysis, and prediction assessment. As follows, please see detailed descriptions. Study area and data acquisition

Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, EV charging data was automatically collected from a mobile platform used by EV drivers to locate public charging stations. Through this platform, users could access real-time information on each charging pile, including its availability (e.g., busy or idle), charging price, and geographic coordinates. Accordingly, we recorded the charging-related data at five-minute intervals from September 1, 2022, to February 28, 2023. This data collection process was fully digital and did not require manual readings. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city were acquired from two meteorological observatories situated in the airport and central regions, respectively. These meteorological data are publicly available on the Shenzhen Government Data Open Platform. Thirdly, point of interest (POI) data was extracted through the Application Programming Interface Platform of AMap.com, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions.

Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, a program was employed to extract the status (e.g., busy or idle, charging price, electricity volume, and coordinates) of each charging pile at five-minute intervals from 1 September 2022 to 28 February 2023. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city was acquired from two meteorological observatories situated in the airport and central regions, respectively. Thirdly, point of interest (POI) data was extracted, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions. Processing raw information into well-structured data To streamline the utilization of the UrbanEV dataset, we harmonize heterogeneous data from various sources into well-structured data with aligned temporal and spatial resolutions. This process can be segmented into two parts: the reorganization of EV charging data and the preparation of other influential factors. EV charging data The raw charging data, obtained from publicly available EV charging services, pertains to charging stations and predominantly comprises string-type records at a 5-minute interval. To transform this raw data into a structured time series tailored for prediction tasks, we implement the following three key measures:

Initial Extraction. From the string-type records, we extract vital information for each charging pile, such as availability (designated as "busy" or "idle"), rated power, and the corresponding charging and service fees applicable during the observed time periods. First, a charging pile is categorized as "active charging" if its states at two consecutive timestamps are both "busy". Consequently, the occupancy within a charging station can be defined as the count of in-use charging piles, while the charging duration is calculated as the product of the count of in-use piles and the time between the two timestamps (in our case, 5 minutes). Moreover, the charging volume in a station can correspondingly be estimated by multiplying the duration by the piles' rated power. Finally, the average electricity price and service price are calculated for each station in alignment with the same temporal resolution as the three charging variables.

Error Detection and Imputation. Ensuring data quality is paramount when utilizing charging data for decision-making, advanced analytics, and machine-learning applications. It is crucial to address concerns around data cleanliness, as the presence of inaccuracies and inconsistencies, often referred to as dirty data, can significantly compromise the reliability and validity of any subsequent analysis or modeling efforts. To improve data quality of our charging data, several errors are identified, particularly the negative values for charging fees and the inconsistencies between the counts of occupied, idle, and total charging piles. We remove the records containing these anomalies and treat them as missing data. Besides that, a two-step imputation process was implemented to address missing values. First, forward filling replaced missing values using data from preceding timestamps. Then, backward filling was applied to fill gaps at the start of each time series. Moreover, a certain number of outliers were identified in the dataset, which could significantly impact prediction performance. To address this, the interquartile range (IQR) method was used to detect outliers for metrics including charging volume (v), charging duration (d), and the rate of active charging piles at the charging station (o). To retain more original data and minimize the impact of outlier correction on the overall data distribution, we set the coefficient to 4 instead of the default 1.5. Finally, each outlier was replaced by the mean of its adjacent valid values. This preprocessing pipeline transformed the raw data into a structured and analyzable dataset.

Aggregation and Filtration. Building upon the station-level charging data that has been extracted and cleansed, we further organize the data into a region-level dataset with an hourly interval providing a new perspective for EV charging behavior analysis. This is achieved by two major processes: aggregation and filtration. First, we aggregate all the charging data from both temporal and spatial views: a. Temporally, we standardize all time-series data to a common time resolution of one hour, as it serves as the least common denominator among the various resolutions. This aims to establish a unified temporal resolution for all time-series data, including pricing schemes, weather records, and charging data, thereby creating a well-structured dataset. Aggregation rules specify that the five-minute charging volume v and duration $(d)$ are summed within each interval (i.e., one hour), whereas the occupancy o, electricity price pe, and service price ps are assigned specific values at certain hours for each charging pile. This distinction arises from the inherent nature of these data types: volume v and duration d are cumulative, while o, pe, and ps are instantaneous variables. Compared to using the mean or median values within each interval, selecting the instantaneous values of o, pe, and ps as representatives preserves the original data patterns more effectively and minimizes the influence of human interpretation. b. Spatially, stations are aggregated based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. After aggregation, our aggregated dataset comprises 331 regions (also called traffic zones) with 4344 timestamps. Second, variance tests and zero-value filtering functions were employed to filter out traffic zones with zero or no change in charging data. Specifically, it means that
Z
360-info/tracker-seaice: Daily sea ice extent: v2024-11-28
data.niaid.nih.gov
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Goldie (2024). 360-info/tracker-seaice: Daily sea ice extent: v2024-11-28 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10892561
Explore at:
Dataset updated
Nov 29, 2024
Dataset provided by
360info
Authors
James Goldie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Tracks the daily sea ice extent for the Arctic Circle and Antarctica using the NSIDC's Sea Ice Index dataset, as well as pre-calculating several useful measures: historical inter-quartile range across the year, the previous lowest year and the previous year.
d
Data release for solar-sensor angle analysis subset associated with the...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for solar-sensor angle analysis subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-solar-sensor-angle-analysis-subset-associated-with-the-journal-article-so
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Western United States, United States
Description
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Italy: Mobility COVID-19
kaggle.com
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mr. Rahman (2021). Italy: Mobility COVID-19 [Dataset]. https://www.kaggle.com/motiurse/italy-mobility-covid19/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mr. Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy
Description
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.

If you find the data helpful or you use the data for your research, please cite our work:

Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).

The data record is structured into 4 comma-separated value (CSV) files, as follows:

id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:

COD_PROV is an integer field that is used to identify a province in all other data records;

SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);

DEN_PCM is the full name of the province.

OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:

p1: COD_PROV of origin,

p2: COD_PROV of destination,

day: in the format yyyy-mm-dd.

median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:

COD_PROV of the province;

SIGLA of the province;

DEN_PCM of the province;

week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.

week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,

week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,

average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:

COD_PROV of the province;

SIGLA of the province;

DEN_PCM of the province;

day in the format yyyy-mm-dd.

ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Dataset of WiFi-based Environment-independent In-baggage Object...
zenodo.org
zip
Updated Feb 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen; Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen (2023). Dataset of WiFi-based Environment-independent In-baggage Object Identification System [Dataset]. http://doi.org/10.5281/zenodo.7631168
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7631168
Dataset updated
Feb 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen; Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:

The dataset of environment-independent in-baggage object identification system leveraging low-cost WiFi. The dataset contains the extracted CSI features from 14 representative in-baggage objects of 4 different materials. The experiments are conducted in 3 different office environments with different sizes. We hope this dataset will help researchers to reproduce the former work of in-baggage object identification through WiFi sensing.

Dataset Format:

.mat files

Section 1: Device Configuration:

Transmitter: Aaronia HyperLOG 7060 direction antenna with a Dell Inspiron 3910 desktop for control.

Receiver: Hawking HD9DP orthogonal antennas with a Dell Inspiron 3910 desktop for control

NIC: Atheros QCA9590. The configuration and installation guide of CSI tool can be found at https://wands.sg/research/wifi/AtherosCSI/

WiFi Packet Rate: 1000 pkts/s

Section 2: Data Format

We provide the CSI features through .mat files. The details are shown in the following:

14 different objects made of 4 different materials are included in 3 different environments and 3 different days.

Each object is tested for 60 seconds and repeated for 3 times.

The dataset file name is presented as "Object_Number". The detailed information are:

Object: The object we involved in the experiment (e.g., book, laptop)

Number: The number of repeats.

Section 3: Experimental Setups

There are 3 different office experiment setups for our data collection. The detailed setups are shown in the paper. For the objects, we involve 14 types of objects made of 4 different materials.

Environments:

3 different environments are involved, including 3 office environments with the size of 15 ft × 13 ft, 16 ft × 12 ft, 28 ft × 23 ft, respectively.

For each room environment, data is collected on different days and with different furniture settings (i.e., 2 desks and 2 chairs are moved at least 3 ft. )

Representative objects:

Data is collected using 14 representative objects of 4 different materials including fiber: book, magazine, newspaper; metal: thermal cup, laptop; cotton/polyester: cotton T-shirts (×2), cotton T-shirts (×4), hoodie, polyester T-shirts, polyester pants; water: 1L bottle with 1L water, 1L bottle with 500ml water, 500ml bottle with 500ml water.

Section 4: Data Description

For our data organization, we separate the data files into different folders based on different days and different environments. Under these folders, data are further distributed in terms of different objects and repeat times. All the files are .mat files, which can be directly read for further applications.

Features of CSI amplitude: We calculate 7 different types of statistical features, including mean, variance, median, skewness, kurtosis, interquartile range and range, and polarization feature from CSI amplitude. Particularly, we calculate the features for all 56 subcarriers with different operating frequencies and responses to the target object.

Features of CSI phase: For the features of CSI phase, the same features with CSI amplitude are extracted and stored in the dataset.

Section 6: Citations

If your work is related to our work, please cite our papers as follows.

https://ieeexplore.ieee.org/document/9637801

Shi, Cong, Tianming Zhao, Yucheng Xie, Tianfang Zhang, Yan Wang, Xiaonan Guo, and Yingying Chen. "Environment-independent in-baggage object identification using wifi signals." In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp. 71-79. IEEE, 2021.
f
DataSheet1_Estimation of horizontal running power using foot-worn inertial...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gremeaux, Vincent; Falbriard, Mathieu; Apte, Salil; Millet, Grégoire P.; Aminian, Kamiar; Meyer, Frédéric (2023). DataSheet1_Estimation of horizontal running power using foot-worn inertial measurement units.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001115952
Explore at:
Dataset updated
Jun 22, 2023
Authors
Gremeaux, Vincent; Falbriard, Mathieu; Apte, Salil; Millet, Grégoire P.; Aminian, Kamiar; Meyer, Frédéric
Description
Feedback of power during running is a promising tool for training and determining pacing strategies. However, current power estimation methods show low validity and are not customized for running on different slopes. To address this issue, we developed three machine-learning models to estimate peak horizontal power for level, uphill, and downhill running using gait spatiotemporal parameters, accelerometer, and gyroscope signals extracted from foot-worn IMUs. The prediction was compared to reference horizontal power obtained during running on a treadmill with an embedded force plate. For each model, we trained an elastic net and a neural network and validated it with a dataset of 34 active adults across a range of speeds and slopes. For the uphill and level running, the concentric phase of the gait cycle was considered, and the neural network model led to the lowest error (median ± interquartile range) of 1.7% ± 12.5% and 3.2% ± 13.4%, respectively. The eccentric phase was considered relevant for downhill running, wherein the elastic net model provided the lowest error of 1.8% ± 14.1%. Results showed a similar performance across a range of different speed/slope running conditions. The findings highlighted the potential of using interpretable biomechanical features in machine learning models for the estimating horizontal power. The simplicity of the models makes them suitable for implementation on embedded systems with limited processing and energy storage capacity. The proposed method meets the requirements for applications needing accurate near real-time feedback and complements existing gait analysis algorithms based on foot-worn IMUs.
VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open...
data.nasa.gov
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/vla-cosmos-survey-324-mhz-continuum-source-catalog
Explore at:
Dataset updated
Sep 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This table contains a source catalog based on 90-cm (324-MHz) Very Large Array (VLA) imaging of the COSMOS field, comprising a circular area of 3.14 square degrees centered on 10^h 00^m 28.6^s, 02^o 12' 21" (J2000.0 RA and Dec). The image from the merger of 3 nights of observations using all 27 VLA antennas had an effective total integration time of ~ 12 hours, an 8.0 arcsecond x 6.0 arcsecond angular resolution, and an average rms of 0.5 mJy beam^-1. The extracted catalog contains 182 sources (down to 5.5 sigma), 30 of which are multi-component sources. Using Monte Carlo artificial source simulations, the authors derive the completeness of the catalog, and show that their 90-cm source counts agree very well with those from previous studies. In their paper, the authors use X-ray, NUV-NIR and radio COSMOS data to investigate the population mix of this 90-cm radio sample, and find that the sample is dominated by active galactic nuclei. The average 90-20 cm spectral index (S_nu~ nu^alpha, where S_nu is the flux density at frequency nu and alpha the spectral index) of the 90-cm selected sources is -0.70, with an interquartile range from -0.90 to -0.53. Only a few ultra-steep-spectrum sources are present in this sample, consistent with results in the literature for similar fields. These data do not show clear steepening of the spectral index with redshift. Nevertheless, this sample suggests that sources with spectral indices steeper than -1 all lie at z >~ 1, in agreement with the idea that ultra-steep-spectrum radio sources may trace intermediate-redshift galaxies (z >~ 1). Using both the signal and rms maps (see Figs. 1 and 2 in the reference paper) as input data, the authors ran the AIPS task SAD to obtain a catalog of candidate components above a given local signal-to-noise ratio (S/N) threshold. The task SAD was run four times with search S/N levels of 10, 8, 6 and 5, using the resulting residual image each time. They recovered all the radio components with a local S/N > 5.00. Subsequently, all the selected components were visually inspected, in order to check their reliability, especially for the components near strong side-lobes. After a careful analysis, a S/N threshold of 5.50 was adopted as the best compromise between a deep and a reliable catalog. The procedure yielded a total of 246 components with a local S/N > 5.50. More than one component, identified in the 90-cm map sometimes belongs to a single radio source (e.g. large radio galaxies consist of multiple components). Using the 90-cm COSMOS radio map, the authors combined the various components into single sources based on visual inspection. The final catalog (contained in this HEASARC table) lists 182 radio sources, 30 of which have been classified as multiple, i.e. they are better described by more than a single component. Moreover, in order to ensure a more precise classification, all sources identified as multi-component sources have been also double-checked using the 20-cm radio map. The authors found that all the 26 multiple 90-cm radio sources within the 20-cm map have 20-cm counterpart sources already classified as multiple. The authors have made use of the VLA-COSMOS Large and Deep Projects over 2 square degrees, reaching down to an rms of ~15 µJy beam¹ ^ at 1.4 GHz and 1.5 arcsec resolution (Schinnerer et al. 2007, ApJS, 172, 46: the VLACOSMOS table in the HEASARC database). The 90-cm COSMOS radio catalog has, however, been extracted from a larger region of 3.14 square degrees (see Fig. 1 and Section 3.1 of the reference paper). This implies that a certain number of 90-cm sources (48) lie outside the area of the 20-cm COSMOS map used to select the radio catalog. Thus, to identify the 20-cm counterparts of the 90-cm radio sources, the authors used the joint VLA-COSMOS catalog (Schinnerer et al. 2010, ApJS, 188, 384: the VLACOSMJSC table in the HEASARC database) for the 134 sources within the 20-cm VLA-COSMOS area and the VLA- FIRST survey (White et al. 1997, ApJ, 475, 479: the FIRST table in the HEASARC database) for the remaining 48 sources. The 90-cm sources were cross-matched with the 20-cm VLA-COSMOS sources using a search radius of 2.5 arcseconds, while the cross-match with the VLA-FIRST sources has been done using a search radius of 4 arcseconds in order to take into account the larger synthesized beam of the VLA-FIRST survey of ~5 arcseconds. Finally, all the 90 cm - 20 cm associations were visually inspected in order to ensure also the association of the multiple 90-cm radio sources for which the value of the search radius used during the cross-match could be too restrictive. In summary, out of the total of 182 sources in the 90-cm catalog, 168 have counterparts at 20 cm. This table was created by the HEASARC in October 2014 based on an electronic version of Table 1 from the reference paper which was obtained from the COSMOS web site at IRSA, specifically the file vla-cosmos_327_sources_published_version.tbl at http://irsa.ipac.caltech.edu/data/COSMOS/tables/vla/. This is a service provided by NASA HEASARC .
f
Data from: Effect of general anesthesia on postoperative pulmonary embolism
datasetcatalog.nlm.nih.gov
tandf.figshare.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deng, Yingbin; Wang, Liang; Xu, Junnan; Chen, Fengyu; Yu, Xinyuan; Weng, Jie; Wang, Zhiyi; Shi, Yilong (2025). Effect of general anesthesia on postoperative pulmonary embolism [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002036003
Explore at:
Dataset updated
Jul 10, 2025
Authors
Deng, Yingbin; Wang, Liang; Xu, Junnan; Chen, Fengyu; Yu, Xinyuan; Weng, Jie; Wang, Zhiyi; Shi, Yilong
Description
The influence of anesthesia type and duration on the occurrence of pulmonary embolism (PE) after surgery remains controversial. This study investigates the association between anesthesia type and duration with postoperative PE. A retrospective cohort of adult patients undergoing surgery from May 2020 to August 2024 at large-scale general hospitals was analyzed. Multivariable logistic regression models were employed to adjust for potential confounders, and sensitivity analyses (using overlap weighting and array approach) were performed to validate the findings. A total of 178,052 patients were included in the analysis, of whom 91 developed PE after surgery. The median duration of general anesthesia (GA) was 1.72 h, with an interquartile range (IQR) of 1.17–2.52 h. The median duration of regional anesthesia was 1.54 h, with an IQR of 1.20–2.03 h. Anesthesia type and the duration of regional anesthesia were not associated with PE occurrence (adjusted odds ratio [aOR] [95% confidence interval, CI], 1.148 [0.671–2.098], p = 0.631), (aOR [95% CI], 1.117 [0.498–1.557], p = 0.738). The rates of PE consistently increased with GA prolongation (aOR [95% CI], 1.308 [1.176–1.432], p < 0.001). Compared with GA durations < 3 h, prolonged anesthesia was significantly associated with increased PE incidence (aOR [95% CI], 4.398 [2.585–7.565], p < 0.001). These findings were also confirmed by sensitivity analyses. Our study demonstrates that prolonged GA, particularly > 3 h, significantly increases the risk of PE.
f
Characteristics of included meta-analyses.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Normando, David; Khan, Haris; Flores-Mir, Carlos; Mheissen, Samer; Vaiid, Nikhillesh (2024). Characteristics of included meta-analyses. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001363606
Explore at:
Dataset updated
Mar 19, 2024
Authors
Normando, David; Khan, Haris; Flores-Mir, Carlos; Mheissen, Samer; Vaiid, Nikhillesh
Description
BackgroundOrthodontic systematic reviews (SRs) use different methods to pool the individual studies in a meta-analysis when indicated. However, the number of studies included in orthodontic meta-analyses is relatively small. This study aimed to evaluate the direction of estimate changes of orthodontic meta-analyses (MAs) using different between-study variance methods considering the level of heterogeneity when few trials were pooled.MethodsSearch and study selection: Systematic reviews (SRs) published over the last three years, from the 1st of January 2020 to the 31st of December 2022, in six main orthodontic journals with at least one MA pooling five or lesser primary studies were identified. Data collection and analysis: Data were extracted from each eligible MA, which was replicated in a random effect model using DerSimonian and Laird (DL), Paule–Mandel (PM), Restricted maximum-likelihood (REML), Hartung Knapp and Sidik Jonkman (HKSJ) methods. The results were reported using median and interquartile range (IQR) for continuous data and frequencies for categorical data and analyzed using non-parametric tests. The Boruta algorithm was used to assess the significant predictors for the significant change in the confidence interval between the different methods compared to the DL method, which was only feasible using the HKSJ method.Results146 MAs were included, most applying the random effect model (n = 111; 76%) and pooling continuous data using mean difference (n = 121; 83%). The median number of studies was three (range 2, 4), and the overall statistical heterogeneity (I2 ranged from 0 to 99% with a median of 68%). Close to 60% of the significant findings became non-significant when HKSJ was applied compared to the DL method and when the heterogeneity was present I2>0%. On the other hand, 30.43% of the non-significant meta-analyses using the DL method became significant when HKSJ was used when the heterogeneity was absent I2 = 0%.ConclusionOrthodontic MAs with few studies can produce different results based on the between-study variance method and the statistical heterogeneity level. Compared to DL, HKSJ method is overconservative when I2 is greater than 0% and may result in false positive findings when the heterogeneity is absent.
Characteristics of the included medications.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benazir Hodzic-Santor; Chana A. Sacks; Tamara Van Bakel; Michael Fralick (2023). Characteristics of the included medications. [Dataset]. http://doi.org/10.1371/journal.pone.0281076.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0281076.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Benazir Hodzic-Santor; Chana A. Sacks; Tamara Van Bakel; Michael Fralick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Characteristics of the included medications.
Median Pearson correlation coefficient (PCC) and interquartile range (IQR)...
plos.figshare.com
xls
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Kaiser; Meby Mudavamkunnel; Martin Bertsch; Christoph J. Laux; Ines Unterfrauner; Florian Wanivenhaus; David E. Bauer; Thorsten Jentzsch; Alexandra Stauffer; Mazda Farshad; Sasa Cukovic (2025). Median Pearson correlation coefficient (PCC) and interquartile range (IQR) for all 3 datasets from Studies 1,2, and 3. Each PCC is separately calculated for the sagittal and coronal plane, and for the original smoothed line markings of SPL and ISL and the smoothed line markings after applying a Procrustes transformation to the ISL. [Dataset]. http://doi.org/10.1371/journal.pone.0321429.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0321429.t006
Dataset updated
Jul 14, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Mirko Kaiser; Meby Mudavamkunnel; Martin Bertsch; Christoph J. Laux; Ines Unterfrauner; Florian Wanivenhaus; David E. Bauer; Thorsten Jentzsch; Alexandra Stauffer; Mazda Farshad; Sasa Cukovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Median Pearson correlation coefficient (PCC) and interquartile range (IQR) for all 3 datasets from Studies 1,2, and 3. Each PCC is separately calculated for the sagittal and coronal plane, and for the original smoothed line markings of SPL and ISL and the smoothed line markings after applying a Procrustes transformation to the ISL.
f
Data from: Reaction-Free Energies for Complexation of Carbohydrates by...
acs.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gustavo Adolfo Lara-Cruz; Thomas Rose; Stefan Grimme; Andres Jaramillo-Botero (2024). Reaction-Free Energies for Complexation of Carbohydrates by Tweezer Diboronic Acids [Dataset]. http://doi.org/10.1021/acs.jpcb.4c04846.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jpcb.4c04846.s002
Dataset updated
Sep 16, 2024
Dataset provided by
ACS Publications
Authors
Gustavo Adolfo Lara-Cruz; Thomas Rose; Stefan Grimme; Andres Jaramillo-Botero
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The accurate calculation of reaction-free energies (ΔrG°) for diboronic acids and carbohydrates is challenging due to reactant flexibility and strong solute–solvent interactions. In this study, these challenges are addressed with a semiautomatic workflow based on quantum chemistry methods to calculate conformational free energies, generate microsolvated solute structural ensembles, and compute ΔrG°. Workflow parameters were optimized for accuracy and precision while controlling computational costs. We assessed the accuracy by studying three reactions of diboronic acids with glucose and galactose, finding that the conformational entropy contributes significantly (by 3–5 kcal/mol at room temperature). Explicit solvent molecules improve the computed ΔrG° accuracy by about 4 kcal/mol compared to experimental data, though using 13 or more water molecules reduced precision and increased computational overhead. After fine-tuning, the workflow demonstrated remarkable accuracy, with an absolute error of about 2 kcal/mol compared to experimental ΔrG° and an average interquartile range of 2.4 kcal/mol. These results highlight the workflow’s potential for designing and screening tweezer-like ligands with tailored selectivity for various carbohydrates.
Residual stenosis after carotid artery stenting: Effect on periprocedural...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jihoon Kang; Jeong-Ho Hong; Beom Joon Kim; Hee-Joon Bae; O-Ki Kwon; Chang Wan Oh; Cheolkyu Jung; Ji Sung Lee; Moon-Ku Han (2023). Residual stenosis after carotid artery stenting: Effect on periprocedural and long-term outcomes [Dataset]. http://doi.org/10.1371/journal.pone.0216592
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0216592
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jihoon Kang; Jeong-Ho Hong; Beom Joon Kim; Hee-Joon Bae; O-Ki Kwon; Chang Wan Oh; Cheolkyu Jung; Ji Sung Lee; Moon-Ku Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveThis study investigated the effect of residual stenosis after carotid artery stenting (CAS) on periprocedural and long-term outcomes.MethodsPatients treated with CAS for symptomatic or asymptomatic carotid arterial stenosis were consecutively enrolled. Residual stenosis was estimated from post-procedure angiography findings. The effects of residual stenosis on 30-day periprocedural outcome and times to restenosis and clinical outcome were analyzed using logistic regression models and Wei-Lin-Weissfeld models, respectively.ResultsA total of 412 patients (age, 64.7 ± 17.0 years; male, 82.0%) were enrolled. The median baseline stenosis was 80% (interquartile range [IQR], 70–90%), which improved to 10% (0–30%) for residual stenosis. Residual stenosis was significantly associated with periprocedural outcome (adjusted odds ratio, 0.983; 95% confidence interval [CI], 0.965–0.999, P = 0.01) after adjustment for baseline stenosis, age, hypertension, symptomaticity, and statin use. Over the 5-year observation period, residual stenosis did not increase the global hazard for restenosis and clinical outcome (adjusted hazard ratio, 1.011; 95% CI, 0.997–1.025. In the event-specific model, residual stenosis increased the hazard for restenosis (adjusted hazard ratio, 1.041; 1.012–1.072) but not for clinical outcome (adjusted hazard ratio, 1.011; 0.997–1.025).ConclusionsResidual stenosis after carotid artery stenting may be useful to predict periprocedural outcome and restenosis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mehar Shan Ali (2025). Walmart Stocks Data 2025 [Dataset]. https://www.kaggle.com/meharshanali/walmart-stocks-data-2025

Walmart Stocks Data 2025

Walmart Stocks Data 1972 to 2025

Explore at:

zip(467062 bytes)Available download formats

Dataset updated

Feb 23, 2025

Authors

Mehar Shan Ali

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.

📌 Dataset Overview

This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.

📊 Features Included in the Dataset

Date 📅 – The trading day recorded.
Open Price 🟢 – Price at market open.
High Price 🔼 – Highest price of the day.
Low Price 🔽 – Lowest price of the day.
Close Price 🔴 – Price at market close.
Adjusted Close Price 📉 – Closing price adjusted for splits & dividends.
Trading Volume 📈 – Total shares traded.
Dividends 💰 – Cash payments to shareholders.
Stock Splits 🔄 – Records stock split events.

🔍 Exploratory Data Analysis (EDA) Steps

This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:

1️⃣ Data Preprocessing & Cleaning

Load data using Pandas
Handle missing values (if any)
Check data types and format them properly
Convert date column into a datetime format

2️⃣ Descriptive Statistics & Summary

Calculate key statistical measures like mean, median, standard deviation, and interquartile range (IQR)
Identify stock price trends over time
Check data distribution and skewness

3️⃣ Data Visualizations

📉 Line Plot – Analyze trends in closing prices over time.
📦 Box Plot – Detect potential outliers in stock prices.
📊 Histogram – Understand the distribution of closing prices.
📈 Moving Averages – Use short-term and long-term moving averages to observe stock trends.
🔥 Correlation Heatmap – Find relationships between stock market indicators.

4️⃣ Time Series Analysis

Identify trends and seasonality in the stock price data.
Calculate daily, weekly, and monthly returns.
Use rolling windows to analyze moving averages and volatility.

5️⃣ Insights & Conclusions

How volatile is Walmart’s stock over the given period?
Does the stock exhibit strong uptrends or downtrends?
Are there any strong correlations between features?
What insights can be drawn for investors and traders?

🚀 Use Cases & Applications

This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.

📥 Download the dataset and explore Walmart’s stock performance today! 🚀

Clear search

Close search

Google apps

Main menu

Walmart Stocks Data 2025

📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

📌 Dataset Overview

📊 Features Included in the Dataset

🔍 Exploratory Data Analysis (EDA) Steps

1️⃣ Data Preprocessing & Cleaning

2️⃣ Descriptive Statistics & Summary

3️⃣ Data Visualizations

4️⃣ Time Series Analysis

5️⃣ Insights & Conclusions

🚀 Use Cases & Applications

Human Activity Recognition Dataset

Sinus computed tomography findings in patients with COVID-19

United States Climate Reference Network (USCRN) Standardized Soil Moisture...

Data from: S1 Dataset -

Gender, Age, and Emotion Detection from Voice

Context

Content

Acknowledgements

Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...

Data from: Urbanev: An open benchmark dataset for urban electric vehicle...

360-info/tracker-seaice: Daily sea ice extent: v2024-11-28

Data release for solar-sensor angle analysis subset associated with the...

Italy: Mobility COVID-19

Dataset of WiFi-based Environment-independent In-baggage Object...

DataSheet1_Estimation of horizontal running power using foot-worn inertial...

VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open...

Data from: Effect of general anesthesia on postoperative pulmonary embolism

Characteristics of included meta-analyses.

Characteristics of the included medications.

Median Pearson correlation coefficient (PCC) and interquartile range (IQR)...

Data from: Reaction-Free Energies for Complexation of Carbohydrates by...

Residual stenosis after carotid artery stenting: Effect on periprocedural...

Walmart Stocks Data 2025

Walmart Stocks Data 1972 to 2025

📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

📌 Dataset Overview

📊 Features Included in the Dataset

🔍 Exploratory Data Analysis (EDA) Steps

1️⃣ Data Preprocessing & Cleaning

2️⃣ Descriptive Statistics & Summary

3️⃣ Data Visualizations

4️⃣ Time Series Analysis

5️⃣ Insights & Conclusions

🚀 Use Cases & Applications