21 datasets found

Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Numpy , pandas and matplot lib practice
kaggle.com
zip
Updated Jul 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pratham saraf (2023). Numpy , pandas and matplot lib practice [Dataset]. https://www.kaggle.com/datasets/prathamsaraf1389/numpy-pandas-and-matplot-lib-practise/suggestions
Explore at:
zip(385020 bytes)Available download formats
Dataset updated
Jul 16, 2023
Authors
pratham saraf
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.

Specifics of the Dataset:

The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.

One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:

Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule

Context of the Dataset:

The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:

The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.
Time Series Data of Carbon Monoxide Concentrations
kaggle.com
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
REDNAM MANIKANTA SAI NEERAJ (2024). Time Series Data of Carbon Monoxide Concentrations [Dataset]. https://www.kaggle.com/datasets/manikantasai18/time-series-data-of-carbon-monoxide-concentrations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2024
Dataset provided by
Kaggle
Authors
REDNAM MANIKANTA SAI NEERAJ
Description
The dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.

The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer

CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.

CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**

Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.

Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.
VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open...
data.nasa.gov
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/vla-cosmos-survey-324-mhz-continuum-source-catalog
Explore at:
Dataset updated
Sep 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This table contains a source catalog based on 90-cm (324-MHz) Very Large Array (VLA) imaging of the COSMOS field, comprising a circular area of 3.14 square degrees centered on 10^h 00^m 28.6^s, 02^o 12' 21" (J2000.0 RA and Dec). The image from the merger of 3 nights of observations using all 27 VLA antennas had an effective total integration time of ~ 12 hours, an 8.0 arcsecond x 6.0 arcsecond angular resolution, and an average rms of 0.5 mJy beam^-1. The extracted catalog contains 182 sources (down to 5.5 sigma), 30 of which are multi-component sources. Using Monte Carlo artificial source simulations, the authors derive the completeness of the catalog, and show that their 90-cm source counts agree very well with those from previous studies. In their paper, the authors use X-ray, NUV-NIR and radio COSMOS data to investigate the population mix of this 90-cm radio sample, and find that the sample is dominated by active galactic nuclei. The average 90-20 cm spectral index (S_nu~ nu^alpha, where S_nu is the flux density at frequency nu and alpha the spectral index) of the 90-cm selected sources is -0.70, with an interquartile range from -0.90 to -0.53. Only a few ultra-steep-spectrum sources are present in this sample, consistent with results in the literature for similar fields. These data do not show clear steepening of the spectral index with redshift. Nevertheless, this sample suggests that sources with spectral indices steeper than -1 all lie at z >~ 1, in agreement with the idea that ultra-steep-spectrum radio sources may trace intermediate-redshift galaxies (z >~ 1). Using both the signal and rms maps (see Figs. 1 and 2 in the reference paper) as input data, the authors ran the AIPS task SAD to obtain a catalog of candidate components above a given local signal-to-noise ratio (S/N) threshold. The task SAD was run four times with search S/N levels of 10, 8, 6 and 5, using the resulting residual image each time. They recovered all the radio components with a local S/N > 5.00. Subsequently, all the selected components were visually inspected, in order to check their reliability, especially for the components near strong side-lobes. After a careful analysis, a S/N threshold of 5.50 was adopted as the best compromise between a deep and a reliable catalog. The procedure yielded a total of 246 components with a local S/N > 5.50. More than one component, identified in the 90-cm map sometimes belongs to a single radio source (e.g. large radio galaxies consist of multiple components). Using the 90-cm COSMOS radio map, the authors combined the various components into single sources based on visual inspection. The final catalog (contained in this HEASARC table) lists 182 radio sources, 30 of which have been classified as multiple, i.e. they are better described by more than a single component. Moreover, in order to ensure a more precise classification, all sources identified as multi-component sources have been also double-checked using the 20-cm radio map. The authors found that all the 26 multiple 90-cm radio sources within the 20-cm map have 20-cm counterpart sources already classified as multiple. The authors have made use of the VLA-COSMOS Large and Deep Projects over 2 square degrees, reaching down to an rms of ~15 µJy beam¹ ^ at 1.4 GHz and 1.5 arcsec resolution (Schinnerer et al. 2007, ApJS, 172, 46: the VLACOSMOS table in the HEASARC database). The 90-cm COSMOS radio catalog has, however, been extracted from a larger region of 3.14 square degrees (see Fig. 1 and Section 3.1 of the reference paper). This implies that a certain number of 90-cm sources (48) lie outside the area of the 20-cm COSMOS map used to select the radio catalog. Thus, to identify the 20-cm counterparts of the 90-cm radio sources, the authors used the joint VLA-COSMOS catalog (Schinnerer et al. 2010, ApJS, 188, 384: the VLACOSMJSC table in the HEASARC database) for the 134 sources within the 20-cm VLA-COSMOS area and the VLA- FIRST survey (White et al. 1997, ApJ, 475, 479: the FIRST table in the HEASARC database) for the remaining 48 sources. The 90-cm sources were cross-matched with the 20-cm VLA-COSMOS sources using a search radius of 2.5 arcseconds, while the cross-match with the VLA-FIRST sources has been done using a search radius of 4 arcseconds in order to take into account the larger synthesized beam of the VLA-FIRST survey of ~5 arcseconds. Finally, all the 90 cm - 20 cm associations were visually inspected in order to ensure also the association of the multiple 90-cm radio sources for which the value of the search radius used during the cross-match could be too restrictive. In summary, out of the total of 182 sources in the 90-cm catalog, 168 have counterparts at 20 cm. This table was created by the HEASARC in October 2014 based on an electronic version of Table 1 from the reference paper which was obtained from the COSMOS web site at IRSA, specifically the file vla-cosmos_327_sources_published_version.tbl at http://irsa.ipac.caltech.edu/data/COSMOS/tables/vla/. This is a service provided by NASA HEASARC .
d
Data release for solar-sensor angle analysis subset associated with the...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for solar-sensor angle analysis subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-solar-sensor-angle-analysis-subset-associated-with-the-journal-article-so
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Western United States, United States
Description
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Dataset of WiFi-based Environment-independent In-baggage Object...
zenodo.org
zip
Updated Feb 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen; Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen (2023). Dataset of WiFi-based Environment-independent In-baggage Object Identification System [Dataset]. http://doi.org/10.5281/zenodo.7631168
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7631168
Dataset updated
Feb 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen; Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:

The dataset of environment-independent in-baggage object identification system leveraging low-cost WiFi. The dataset contains the extracted CSI features from 14 representative in-baggage objects of 4 different materials. The experiments are conducted in 3 different office environments with different sizes. We hope this dataset will help researchers to reproduce the former work of in-baggage object identification through WiFi sensing.

Dataset Format:

.mat files

Section 1: Device Configuration:

Transmitter: Aaronia HyperLOG 7060 direction antenna with a Dell Inspiron 3910 desktop for control.

Receiver: Hawking HD9DP orthogonal antennas with a Dell Inspiron 3910 desktop for control

NIC: Atheros QCA9590. The configuration and installation guide of CSI tool can be found at https://wands.sg/research/wifi/AtherosCSI/

WiFi Packet Rate: 1000 pkts/s

Section 2: Data Format

We provide the CSI features through .mat files. The details are shown in the following:

14 different objects made of 4 different materials are included in 3 different environments and 3 different days.

Each object is tested for 60 seconds and repeated for 3 times.

The dataset file name is presented as "Object_Number". The detailed information are:

Object: The object we involved in the experiment (e.g., book, laptop)

Number: The number of repeats.

Section 3: Experimental Setups

There are 3 different office experiment setups for our data collection. The detailed setups are shown in the paper. For the objects, we involve 14 types of objects made of 4 different materials.

Environments:

3 different environments are involved, including 3 office environments with the size of 15 ft × 13 ft, 16 ft × 12 ft, 28 ft × 23 ft, respectively.

For each room environment, data is collected on different days and with different furniture settings (i.e., 2 desks and 2 chairs are moved at least 3 ft. )

Representative objects:

Data is collected using 14 representative objects of 4 different materials including fiber: book, magazine, newspaper; metal: thermal cup, laptop; cotton/polyester: cotton T-shirts (×2), cotton T-shirts (×4), hoodie, polyester T-shirts, polyester pants; water: 1L bottle with 1L water, 1L bottle with 500ml water, 500ml bottle with 500ml water.

Section 4: Data Description

For our data organization, we separate the data files into different folders based on different days and different environments. Under these folders, data are further distributed in terms of different objects and repeat times. All the files are .mat files, which can be directly read for further applications.

Features of CSI amplitude: We calculate 7 different types of statistical features, including mean, variance, median, skewness, kurtosis, interquartile range and range, and polarization feature from CSI amplitude. Particularly, we calculate the features for all 56 subcarriers with different operating frequencies and responses to the target object.

Features of CSI phase: For the features of CSI phase, the same features with CSI amplitude are extracted and stored in the dataset.

Section 6: Citations

If your work is related to our work, please cite our papers as follows.

https://ieeexplore.ieee.org/document/9637801

Shi, Cong, Tianming Zhao, Yucheng Xie, Tianfang Zhang, Yan Wang, Xiaonan Guo, and Yingying Chen. "Environment-independent in-baggage object identification using wifi signals." In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp. 71-79. IEEE, 2021.
Walmart Stocks Data 2025
kaggle.com
zip
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehar Shan Ali (2025). Walmart Stocks Data 2025 [Dataset]. https://www.kaggle.com/meharshanali/walmart-stocks-data-2025
Explore at:
zip(467062 bytes)Available download formats
Dataset updated
Feb 23, 2025
Authors
Mehar Shan Ali
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.

📌 Dataset Overview

This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.

📊 Features Included in the Dataset

Date 📅 – The trading day recorded.

Open Price 🟢 – Price at market open.

High Price 🔼 – Highest price of the day.

Low Price 🔽 – Lowest price of the day.

Close Price 🔴 – Price at market close.

Adjusted Close Price 📉 – Closing price adjusted for splits & dividends.

Trading Volume 📈 – Total shares traded.

Dividends 💰 – Cash payments to shareholders.

Stock Splits 🔄 – Records stock split events.

🔍 Exploratory Data Analysis (EDA) Steps

This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:

1️⃣ Data Preprocessing & Cleaning

Load data using Pandas

Handle missing values (if any)

Check data types and format them properly

Convert date column into a datetime format

2️⃣ Descriptive Statistics & Summary

Calculate key statistical measures like mean, median, standard deviation, and interquartile range (IQR)

Identify stock price trends over time

Check data distribution and skewness

3️⃣ Data Visualizations

📉 Line Plot – Analyze trends in closing prices over time.

📦 Box Plot – Detect potential outliers in stock prices.

📊 Histogram – Understand the distribution of closing prices.

📈 Moving Averages – Use short-term and long-term moving averages to observe stock trends.

🔥 Correlation Heatmap – Find relationships between stock market indicators.

4️⃣ Time Series Analysis

Identify trends and seasonality in the stock price data.

Calculate daily, weekly, and monthly returns.

Use rolling windows to analyze moving averages and volatility.

5️⃣ Insights & Conclusions

How volatile is Walmart’s stock over the given period?

Does the stock exhibit strong uptrends or downtrends?

Are there any strong correlations between features?

What insights can be drawn for investors and traders?

🚀 Use Cases & Applications

This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.

📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Gender, Age, and Emotion Detection from Voice
kaggle.com
zip
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/rohitzaman/gender-age-and-emotion-detection-from-voice
Explore at:
zip(967820 bytes)Available download formats
Dataset updated
May 29, 2021
Authors
Rohit Zaman
Description
Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
I
Groundwater productivity in Africa
ihp-wins.unesco.org
data.amerigeoss.org
json, tiff
Updated Aug 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intergovernmental Hydrological Programme (2024). Groundwater productivity in Africa [Dataset]. https://ihp-wins.unesco.org/dataset/groundwater-productivity-in-africa
Explore at:
json, tiffAvailable download formats
Dataset updated
Aug 24, 2024
Dataset provided by
Intergovernmental Hydrological Programme
Area covered
Africa
Description
This 5 km resolution grid indicates what borehole yields (in l/s) can reasonably be expected in different hydrogeological units. The ranges indicate the approximate interquartile range of the yield of boreholes that have been sited and drilled using appropriate techniques. Groundwater productivity is given in liters per second.Detailed description of the methodology, and a full list of data sources used to develop the layer can be found in the peer-reviewed paper available here: http://iopscience.iop.org/article/10.1088/1748-9326/7/2/024009/pdf The raster and a high resolution PDF file are available for download on the website of British Geological Survey (BGS): http://www.bgs.ac.uk/research/groundwater/international/africanGroundwater/mapsDownload.html
Results of regression analyses of hydroclimatic risk factors for human...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karina Cucchi; Runyou Liu; Philip A. Collender; Qu Cheng; Charles Li; Christopher M. Hoover; Howard H. Chang; Song Liang; Changhong Yang; Justin V. Remais (2023). Results of regression analyses of hydroclimatic risk factors for human leptospirosis incidence at the yearly timescale and county resolution. [Dataset]. http://doi.org/10.1371/journal.pntd.0007968.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pntd.0007968.t003
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Karina Cucchi; Runyou Liu; Philip A. Collender; Qu Cheng; Charles Li; Christopher M. Hoover; Howard H. Chang; Song Liang; Changhong Yang; Justin V. Remais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Incidence rate ratios (IRR) and 95% confidence intervals (CIs), were estimated using quasi-Poisson regression and the robust sandwich estimator for variance [42,43] and correspond to an increase in exposure equivalent to the exposures interquartile range within the dataset (0.67 mm precipitation (P); 1.19 mm soil moisture (θ)). Reference values for the each of the exposure variables are presented in Table 2. Bolded values correspond to associations that are statistically significant at the 95% confidence level. Each row corresponds to one model fit. Information supporting variable selection can be found in S1 Text. Results for regressions including other hydroclimatic predictors are presented in S1 Table.
u
Dataset: Proportional recovery in mice with cortical stroke
ldh.stroke-koeln.imise.uni-leipzig.de
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Markus Aswendt (2024). Dataset: Proportional recovery in mice with cortical stroke [Dataset]. http://doi.org/10.12751/g-node.gjf2hv
Explore at:
Unique identifier
https://doi.org/10.12751/g-node.gjf2hv
Dataset updated
Nov 4, 2024
Authors
Markus Aswendt
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Post-Stroke Recovery Data Repository

This repository contains various resources related to the study on post-stroke recovery in a mouse model, focusing on the application of the Proportional Recovery Rule (PRR).

Repository Structure

code/: Contains all the code used for the analysis in this study. Detailed information is available in the README within the code folder.

input/: This folder contains all datasets used in the publication.

output/: This directory includes the final results generated for each dataset. Detailed information for each dataset's output can be found in their respective subfolders.

docs/: Additional documentation related to this project, including extra resources in the form of a README file within this folder.

Methodology Overview

Introduction

The Fugl-Meyer upper extremity score is a widely used assessment tool in clinical settings to evaluate motor function in stroke patients. With a maximum score of 66, higher values indicate better motor performance, while lower values signify greater deficits.

The Proportional Recovery Rule (PRR) suggests that the magnitude of recovery from nonsevere upper limb motor impairment after stroke is approximately 0.7 times the initial impairment. This rule, proposed in 2008, has been applied to various motor and nonmotor impairments, leading to inconsistencies in its formulation and application across studies.

Translating PRR to Deficit Score

In this study, we translated the Fugl-Meyer upper extremity score into a deficit score suitable for use in a mouse model. The PRR posits that the change in impairment can be predicted as 0.7 times the initial impairment, plus an error term. We adapted this rule by fitting a linear regression model without an intercept to relate the initial impairment to the change in impairment.

Data Analysis

Initial Impairment Calculation:

Initial impairment (d-score) is calculated as the difference between the deficit score at day 3 post-stroke and the baseline deficit score.

Change Observed and Predicted:

Change observed: Initial impairment minus deficit score on day 28.

Change predicted: 0.7 times the initial impairment plus an error term.

Cluster Analysis:

Data were plotted with initial impairment on the x-axis and change observed on the y-axis.

A linear fit was applied to generate two lines: one based on the proportional recovery rule and one from the data fit.

Subjects were clustered based on their proximity to these lines, iterating the process until convergence.

Outlier Removal:

Outliers were identified and removed based on the interquartile range rule both initially and during each iteration of the clustering process.

Results

Cluster Characteristics:

The final clustering resulted in 65 subjects following the PRR, with a fixed slope of 0.7 and an intercept of -0.42.

The other cluster contained 21 subjects with a distinct recovery pattern, characterized by a slope of 0.84.

Statistical Analysis:

The slope of the overall linear fit was found to be 0.93.

Approximately 75.58% of the subjects adhered to the PRR, indicating the potential relevance of the PRR in the mouse model.

Additional Information

This structured dataset was created with reference to the following publication:

DOI:10.1038/s41597-023-02242-8

If you have any questions or require further assistance, please do not hesitate to reach out to us. Contact us via email at markus.aswendtATuk-koeln.de or aref.kalantari-sarcheshmehATuk-koeln.de.
Italy: Mobility COVID-19
kaggle.com
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mr. Rahman (2021). Italy: Mobility COVID-19 [Dataset]. https://www.kaggle.com/motiurse/italy-mobility-covid19/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mr. Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy
Description
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.

If you find the data helpful or you use the data for your research, please cite our work:

Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).

The data record is structured into 4 comma-separated value (CSV) files, as follows:

id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:

COD_PROV is an integer field that is used to identify a province in all other data records;

SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);

DEN_PCM is the full name of the province.

OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:

p1: COD_PROV of origin,

p2: COD_PROV of destination,

day: in the format yyyy-mm-dd.

median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:

COD_PROV of the province;

SIGLA of the province;

DEN_PCM of the province;

week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.

week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,

week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,

average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:

COD_PROV of the province;

SIGLA of the province;

DEN_PCM of the province;

day in the format yyyy-mm-dd.

ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
c
CBP Water Quality Monitoring Subset (1984-2018), CB7 4
s.cnmilf.com
gimi9.com
+1more
Updated Sep 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Penn State (Point of Contact) (2025). CBP Water Quality Monitoring Subset (1984-2018), CB7 4 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/cbp-water-quality-monitoring-subset-1984-2018-cb7-4
Explore at:
Dataset updated
Sep 27, 2025
Dataset provided by
Penn State (Point of Contact)
Description
This product was developed as part of the project supported by the grant from and the National Oceanic and Atmospheric Administrationâ€™s Ocean Acidification Program under award NA18OAR0170430 to the Virginia Institute of Marine Science. The data product consists of water quality data for tidal 98 stations for 1984ÂÂâ€“2018. The source data used to generate this product were downloaded from the Chesapeake Bay Programâ€™s (CBP) data hub. Out of the total of 255 monitoring stations in the Tidal Monitoring Program, we selected 98 with the long monitoring record (30 years or longer). The following variables were downloaded from the data hub at the native temporal and vertical resolution (between one and four cruises per month and approximately 10 depth levels sampled between 0 and 37 m) for 1984â€“2018: water temperature (T), salinity (S), pH, total alkalinity (TA), dissolved oxygen (DO) , and chlorophyll (Chl). All pH data prior to 1998 were removed because of the data quality concerns (Herrmann et al., 2020). Briefly, we found a dramatic difference in long-term trends between stations measured by institutions in the state of Virginia and stations measured by the state of Maryland, particularly from late spring to early fall. The boundary between the station groups runs eastâ€“west within the mesohaline portion of the bay, where the Potomac River estuary intersects the mainstem bay. The boundary separates strong negative linear trends to the south (Virginia stations) from neutral and weakly positive linear trends to the north (Maryland stations). For all variables, data entries marked with CBPâ€™s â€œProblemâ€ and â€œQualifierâ€ flags were removed. Additionally, all variables were scanned for extreme outliers: for each variable, data from all stations, depths, and times were combined into a single composite sample for which the 75th and 25th percentiles (i.e., the upper and lower quantiles) and the interquartile range (the difference between the upper and lower quantiles) were calculated. Extreme outliers were defined as the values falling outside of a certain number (censoring criterion) of interquartile ranges from the upper and lower quantiles.
f
Descriptive statistics of variables (Occ. = Occurrences, Medn. = Median, IQR...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna-Katharina Jung; Stefan Stieglitz; Tobias Kissmer; Milad Mirbabaie; Tobias Kroll (2023). Descriptive statistics of variables (Occ. = Occurrences, Medn. = Median, IQR = Interquartile Range). [Dataset]. http://doi.org/10.1371/journal.pone.0266743.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0266743.t005
Dataset updated
Jun 14, 2023
Dataset provided by
PLOS ONE
Authors
Anna-Katharina Jung; Stefan Stieglitz; Tobias Kissmer; Milad Mirbabaie; Tobias Kroll
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics of variables (Occ. = Occurrences, Medn. = Median, IQR = Interquartile Range).
Guardian’s response to postprocedural questionnaires.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ha Ni Lee; Woori Bae; Joong Wan Park; Jae Yun Jung; Soyun Hwang; Do Kyun Kim; Young Ho Kwak (2023). Guardian’s response to postprocedural questionnaires. [Dataset]. http://doi.org/10.1371/journal.pone.0256489.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0256489.t003
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Ha Ni Lee; Woori Bae; Joong Wan Park; Jae Yun Jung; Soyun Hwang; Do Kyun Kim; Young Ho Kwak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Guardian’s response to postprocedural questionnaires.
D
Data from: Utility index and vision related quality of life in patients...
datasetcatalog.nlm.nih.gov
data.niaid.nih.gov
+2more
Updated Jun 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harzheim, Erno; da Silva Etges, Ana Paula Beck; Cabral, Felipe Cezar; Carvalho, Fabiana; de Campos Moreira, Taís; Zanotto, Bruna Stella; Gonçalves, Marcelo Rodrigues; Polanczyk, Carisi Anne; da Silva, Rodolfo Souza; de Araujo, Aline Lutz; Ruschel, Karen Brasil; Umpierre, Roberto Nunes (2024). Utility index and vision related quality of life in patients awaiting specialist eye care [Dataset]. http://doi.org/10.5061/dryad.h44j0zpv3
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.h44j0zpv3
Dataset updated
Jun 22, 2024
Authors
Harzheim, Erno; da Silva Etges, Ana Paula Beck; Cabral, Felipe Cezar; Carvalho, Fabiana; de Campos Moreira, Taís; Zanotto, Bruna Stella; Gonçalves, Marcelo Rodrigues; Polanczyk, Carisi Anne; da Silva, Rodolfo Souza; de Araujo, Aline Lutz; Ruschel, Karen Brasil; Umpierre, Roberto Nunes
Description
Objectives: This study aimed to ascertain utility and vision-related quality of life in patients awaiting access to specialist eye care. A secondary aim was to evaluate the association of utility indices with demographic profile and waiting time. Methods: Consecutive patients that had been waiting for ophthalmology care answered the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25). The questionnaire was administered when patients arrived at the clinics for their first visit. We derived a utility index (VFQ-UI) from the patients’ responses, then calculated the correlation between this index and waiting time and compared utility across demographic subgroups stratified by age, sex, and care setting. Results: 536 individuals participated in the study (mean age 52.9±16.6 years; 370 women, 69% women). The median utility index was 0.85 (interquartile range [IQR] 0.70–0.92; minimum 0.40, maximum 0.97). The mean VFQ-25 score was 70.88±14.59. Utility correlated weakly and nonsignificantly with waiting time (-0.05, P = 0.24). It did not vary across age groups (P = 0.85) or care settings (P = 0.77). Utility was significantly lower for women (0.84, IQR 0.70–0.92) than men (0.87, IQR 0.73–0.93, P = 0.03), but the magnitude of this difference was small (Cohen’s d = 0.13). Conclusion: Patients awaiting access to ophthalmology care had a utility index of 0.85 on a scale of 0 to 1. This measurement was not previously reported in the literature. Utility measures can provide insight into patients’ perspectives and support economic health analyses and inform health policies.
Z
Dataset related to article "Safety of metformin continuation in diabetic...
data.niaid.nih.gov
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chiarito, Mauro; Sanz-Sanchez, Jorge; Piccolo, Raffaele; Condello, Francesco; Liccardo, Gaetano; Maurina, Matteo; Avvedimento, Marisa; Regazzoli, Damiano; Pagnotta, Paolo; Garcia-Garcia, Hector; Mehran, Roxana; Federici, Massimo; Condorelli, Gianluigi; Diez Gil, Jose Luis; Reimers, Bernhard; Ferrante, Giuseppe; Stefanini, Giulio (2023). Dataset related to article "Safety of metformin continuation in diabetic patients undergoing invasive coronary angiography: the NO-STOP single arm trial" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10245525
Explore at:
Dataset updated
Dec 1, 2023
Dataset provided by
University of Naples Federico II
Icahn School of Medicine at Mount Sinai
Hospital Universitario y Politécnico La Fe
Inselspital Universitätsspital Bern
IRCCS Humanitas Research Hospital
Humanitas University
MedStar Washington Hospital Center
University of Rome Tor Vergata
Authors
Chiarito, Mauro; Sanz-Sanchez, Jorge; Piccolo, Raffaele; Condello, Francesco; Liccardo, Gaetano; Maurina, Matteo; Avvedimento, Marisa; Regazzoli, Damiano; Pagnotta, Paolo; Garcia-Garcia, Hector; Mehran, Roxana; Federici, Massimo; Condorelli, Gianluigi; Diez Gil, Jose Luis; Reimers, Bernhard; Ferrante, Giuseppe; Stefanini, Giulio
Description
This record contains raw data related to article "Safety of metformin continuation in diabetic patients undergoing invasive coronary angiography: the NO-STOP single arm trial" Abstract Background: Despite paucity of data, it is common practice to discontinue metformin before invasive coronary angiography due to an alleged risk of Metformin-Associated Lactic Acidosis (M-ALA). We aimed at assessing the safety of metformin continuation in diabetic patients undergoing coronary angiography in terms of significant increase in lactate levels. Methods: In this open-label, prospective, multicentre, single-arm trial, all diabetic patients undergoing coronary angiography with or without percutaneous coronary intervention at 3 European centers were screened for enrolment. The primary endpoint was the increase in lactate levels from preprocedural levels at 72-h after the procedure. Secondary endpoints included contrast associated-acute kidney injury (CA-AKI), M-ALA, and all-cause mortality. Results: 142 diabetic patients on metformin therapy were included. Median preprocedural lactate level was 1.8 mmol/l [interquartile range (IQR) 1.3-2.3]. Lactate levels at 72 h after coronary angiography were 1.7 mmol/l (IQR 1.3-2.3), with no significant differences as compared to preprocedural levels (p = 0.91; median difference = 0; IQR - 0.5 to 0.4 mmol/l). One patient had 72-h levels ≥ 5 mmol/l (5.3 mmol/l), but no cases of M-ALA were reported. CA-AKI occurred in 9 patients (6.1%) and median serum creatinine and estimated glomerular filtration rate remained similar throughout the periprocedural period. At a median follow-up of 90 days (43-150), no patients required hemodialysis and 2 patients died due to non-cardiac causes. Conclusions: In diabetic patients undergoing invasive coronary angiography, metformin continuation throughout the periprocedural period does not increase lactate levels and was not associated with any decline in renal function.
f
Comparison of health-related quality of life between patients with different...
datasetcatalog.nlm.nih.gov
scielo.figshare.com
Updated Jun 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvo-Lobo, César; Tovaruela-Carrión, Natalia; Álvarez-Ruíz, Verónica; Melero-González, Gemma; Bengoa-Vallejo, Ricardo Becerro-de; López-López, Daniel; Losa-Iglesias, Marta Elena (2022). Comparison of health-related quality of life between patients with different metatarsalgia types and matched healthy controls: a cross-sectional analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000413428
Explore at:
Dataset updated
Jun 6, 2022
Authors
Calvo-Lobo, César; Tovaruela-Carrión, Natalia; Álvarez-Ruíz, Verónica; Melero-González, Gemma; Bengoa-Vallejo, Ricardo Becerro-de; López-López, Daniel; Losa-Iglesias, Marta Elena
Description
ABSTRACT BACKGROUND: Metatarsalgia can be considered to be a common complaint in clinical practice. The aim of this study was to compare quality of life (QoL) between participants with different metatarsalgia types and matched-paired healthy controls. DESIGN AND SETTING: A cross-sectional analysis on a sample of 124 participants of median age ± interquartile range of 55 ± 22 years was carried out in the University Clinic of Podiatric Medicine and Surgery, Ferrol, Spain. They presented primary (n = 31), secondary (n = 31) or iatrogenic (n = 31) metatarsalgia, or were matched-paired healthy controls (n = 31). METHODS: Self-reported domain scores were obtained using the Foot Health Status Questionnaire (FHSQ) and were compared between the participants with metatarsalgia and between these and the healthy controls. RESULTS: Statistically significant differences were shown in all FHSQ domains (P ≤ 0.001). Post-hoc analyses showed statistically significant differences (P < 0.05) between the metatarsalgia types in relation to the matched healthy control group, such that the participants with metatarsalgia presented impaired foot-specific and general health-related QoL (lower FHSQ scores). CONCLUSION: This study demonstrated that presence of metatarsalgia had a negative impact on foot health-related QoL. Foot-specific health and general health were poorer among patients with metatarsalgia, especially among those with secondary and iatrogenic metatarsalgia, in comparison with matched healthy controls.
Date.
plos.figshare.com
xlsx
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chengkai Yang; Qian Guo; Yang Cheng; Fengjing Liu; Hui Zhang; Huaxiang Wang (2025). Date. [Dataset]. http://doi.org/10.1371/journal.pone.0329636.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0329636.s001
Dataset updated
Aug 6, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Chengkai Yang; Qian Guo; Yang Cheng; Fengjing Liu; Hui Zhang; Huaxiang Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundFlurbiprofen, as a widely used nonsteroidal anti-inflammatory drug (NSAID), is commonly employed to relieve mild to moderate pain and inflammation. Understanding its adverse reactions in real-world usage is of significant importance.MethodsReports of all adverse drug events (ADEs) related to flurbiprofen were extracted from the FAERS database, covering the period from Q1 2004 to Q3 2024. These reports were standardized and analyzed using various signal quantification techniques, including Reporting Odds Ratios (ROR), Proportional Reporting Ratios (PRR), Bayesian Confidence Propagation Neural Network (BCPNN), and Multi-item Gamma Poisson Shrinkage (MGPS). Finally, the association between flurbiprofen and ADEs as well as clinical medical events was assessed.ResultsA total of 275 cases from the target population were identified in the FAERS database, with 788 instances of adverse events (AEs) occurring across 46 organ systems. We identified not only some common adverse reactions listed in the drug’s package insert, such as acute kidney injury, nausea and vomiting, and facial edema, but also significant signals that were not mentioned in the package insert, including Dysphonia, Drug abuse, and Pancreatitis acute. The median time to onset of flurbiprofen-related AEs was 1 day (interquartile range [IQR] 0–5 days), with most AEs occurring within the first month of flurbiprofen use.ConclusionThis study confirmed some common adverse reactions listed in the flurbiprofen drug package insert and identified significant unexpected adverse reactions. These findings can assist clinicians in conducting more comprehensive clinical monitoring when using the drug, thereby ensuring patient safety during treatment.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation

Meta data and supporting documentation

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Clear search

Close search

Google apps

Main menu

Meta data and supporting documentation

Simulation Data Set

Numpy , pandas and matplot lib practice

Time Series Data of Carbon Monoxide Concentrations

VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open...

Data release for solar-sensor angle analysis subset associated with the...

Dataset of WiFi-based Environment-independent In-baggage Object...

Walmart Stocks Data 2025

📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

📌 Dataset Overview

📊 Features Included in the Dataset

🔍 Exploratory Data Analysis (EDA) Steps

1️⃣ Data Preprocessing & Cleaning

2️⃣ Descriptive Statistics & Summary

3️⃣ Data Visualizations

4️⃣ Time Series Analysis

5️⃣ Insights & Conclusions

🚀 Use Cases & Applications

Gender, Age, and Emotion Detection from Voice

Context

Content

Acknowledgements

Groundwater productivity in Africa

Results of regression analyses of hydroclimatic risk factors for human...

Dataset: Proportional recovery in mice with cortical stroke

Post-Stroke Recovery Data Repository

Repository Structure

Methodology Overview

Introduction

Translating PRR to Deficit Score

Data Analysis

Results

Additional Information

Italy: Mobility COVID-19

CBP Water Quality Monitoring Subset (1984-2018), CB7 4

Descriptive statistics of variables (Occ. = Occurrences, Medn. = Median, IQR...

Guardian’s response to postprocedural questionnaires.

Data from: Utility index and vision related quality of life in patients...

Dataset related to article "Safety of metformin continuation in diabetic...

Comparison of health-related quality of life between patients with different...

Date.

Meta data and supporting documentation