91 datasets found

Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Walmart Stocks Data 2025
kaggle.com
zip
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehar Shan Ali (2025). Walmart Stocks Data 2025 [Dataset]. https://www.kaggle.com/meharshanali/walmart-stocks-data-2025
Explore at:
zip(467062 bytes)Available download formats
Dataset updated
Feb 23, 2025
Authors
Mehar Shan Ali
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.

📌 Dataset Overview

This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.

📊 Features Included in the Dataset

Date 📅 – The trading day recorded.

Open Price 🟢 – Price at market open.

High Price 🔼 – Highest price of the day.

Low Price 🔽 – Lowest price of the day.

Close Price 🔴 – Price at market close.

Adjusted Close Price 📉 – Closing price adjusted for splits & dividends.

Trading Volume 📈 – Total shares traded.

Dividends 💰 – Cash payments to shareholders.

Stock Splits 🔄 – Records stock split events.

🔍 Exploratory Data Analysis (EDA) Steps

This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:

1️⃣ Data Preprocessing & Cleaning

Load data using Pandas

Handle missing values (if any)

Check data types and format them properly

Convert date column into a datetime format

2️⃣ Descriptive Statistics & Summary

Calculate key statistical measures like mean, median, standard deviation, and interquartile range (IQR)

Identify stock price trends over time

Check data distribution and skewness

3️⃣ Data Visualizations

📉 Line Plot – Analyze trends in closing prices over time.

📦 Box Plot – Detect potential outliers in stock prices.

📊 Histogram – Understand the distribution of closing prices.

📈 Moving Averages – Use short-term and long-term moving averages to observe stock trends.

🔥 Correlation Heatmap – Find relationships between stock market indicators.

4️⃣ Time Series Analysis

Identify trends and seasonality in the stock price data.

Calculate daily, weekly, and monthly returns.

Use rolling windows to analyze moving averages and volatility.

5️⃣ Insights & Conclusions

How volatile is Walmart’s stock over the given period?

Does the stock exhibit strong uptrends or downtrends?

Are there any strong correlations between features?

What insights can be drawn for investors and traders?

🚀 Use Cases & Applications

This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.

📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Data from: S1 Dataset -
plos.figshare.com
xlsx
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukundo Siame; Gift C. Chama; Sepiso K. Masenga (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312570.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312570.s002
Dataset updated
Feb 12, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Lukundo Siame; Gift C. Chama; Sepiso K. Masenga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundTuberculosis (TB) remains a significant public health challenge, particularly among vulnerable populations like children. This is especially true in Sub-Saharan Africa, where the burden of TB in children is substantial. Zambia ranks 21st among the top 30 high TB endemic countries globally. While studies have explored TB in adults in Zambia, the prevalence and associated factors in children are not well documented. This study aimed to determine the prevalence and sociodemographic, and clinical factors associated with active TB disease in hospitalized children under the age of 15 years at Livingstone University Teaching Hospital (LUTH), the largest referral center in Zambia’s Southern Province.MethodsThis retrospective cross-sectional study of 700 pediatric patients under 15 years old, utilized programmatic data from the Pediatrics Department at LUTH. A systematic sampling method was used to select participants from medical records. Data on demographics, medical conditions, anthropometric measurements, and blood tests were collected. Data analysis included descriptive statistics, chi-square tests, and multivariable logistic regression to identify factors associated with TB.ResultsThe median age was 24 months (interquartile range (IQR): 11, 60) and majority were male (56.7%, n = 397/700). Most participants were from urban areas (59.9%, n = 419/700), and 9.2% (n = 62/675) were living with HIV. Malnutrition and comorbidities were present in a significant portion of the participants (19.0% and 25.1%, respectively). The prevalence of active TB cases was 9.4% (n = 66/700) among hospitalized children. Persons living with HIV (Adjusted odds ratio (AOR) of 6.30; 95% confidence interval (CI) of 2.85, 13.89, p< 0.001), and those who were malnourished (AOR: 10.38, 95% CI: 4.78, 22.55, p< 0.001) had a significantly higher likelihood of developing active TB disease.ConclusionThis study revealed a prevalence 9.4% active TB among hospitalized children under 15 years at LUTH. HIV status and malnutrition emerged as significant factors associated with active TB disease. These findings emphasize the need for pediatric TB control strategies that prioritize addressing associated factors to effectively reduce the burden of tuberculosis in Zambian children.
Time Series Data of Carbon Monoxide Concentrations
kaggle.com
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
REDNAM MANIKANTA SAI NEERAJ (2024). Time Series Data of Carbon Monoxide Concentrations [Dataset]. https://www.kaggle.com/datasets/manikantasai18/time-series-data-of-carbon-monoxide-concentrations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2024
Dataset provided by
Kaggle
Authors
REDNAM MANIKANTA SAI NEERAJ
Description
The dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.

The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer

CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.

CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**

Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.

Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.
Data from: Urbanev: An open benchmark dataset for urban electric vehicle...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan (2025). Urbanev: An open benchmark dataset for urban electric vehicle charging demand prediction [Dataset]. http://doi.org/10.5061/dryad.np5hqc04z
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.np5hqc04z
Dataset updated
Apr 25, 2025
Dataset provided by
Institute of High Performance Computing
Hong Kong Polytechnic University
Sun Yat-sen University
Authors
Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The recent surge in electric vehicles (EVs), driven by a collective push to enhance global environmental sustainability, has underscored the significance of exploring EV charging prediction. To catalyze further research in this domain, we introduce UrbanEV—an open dataset showcasing EV charging space availability and electricity consumption in a pioneering city for vehicle electrification, namely Shenzhen, China. UrbanEV offers a rich repository of charging data (i.e., charging occupancy, duration, volume, and price) captured at hourly intervals across an extensive six-month span for over 20,000 individual charging stations. Beyond these core attributes, the dataset also encompasses diverse influencing factors like weather conditions and spatial proximity. These factors are thoroughly analyzed qualitatively and quantitatively to reveal their correlations and causal impacts on charging behaviors. Furthermore, comprehensive experiments have been conducted to showcase the predictive capabilities of various models, including statistical, deep learning, and transformer-based approaches, using the UrbanEV dataset. This dataset is poised to propel advancements in EV charging prediction and management, positioning itself as a benchmark resource within this burgeoning field. Methods To build a comprehensive and reliable benchmark dataset, we conduct a series of rigorous processes from data collection to dataset evaluation. The overall workflow sequentially includes data acquisition, data processing, statistical analysis, and prediction assessment. As follows, please see detailed descriptions. Study area and data acquisition

Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, EV charging data was automatically collected from a mobile platform used by EV drivers to locate public charging stations. Through this platform, users could access real-time information on each charging pile, including its availability (e.g., busy or idle), charging price, and geographic coordinates. Accordingly, we recorded the charging-related data at five-minute intervals from September 1, 2022, to February 28, 2023. This data collection process was fully digital and did not require manual readings. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city were acquired from two meteorological observatories situated in the airport and central regions, respectively. These meteorological data are publicly available on the Shenzhen Government Data Open Platform. Thirdly, point of interest (POI) data was extracted through the Application Programming Interface Platform of AMap.com, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions.

Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, a program was employed to extract the status (e.g., busy or idle, charging price, electricity volume, and coordinates) of each charging pile at five-minute intervals from 1 September 2022 to 28 February 2023. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city was acquired from two meteorological observatories situated in the airport and central regions, respectively. Thirdly, point of interest (POI) data was extracted, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions. Processing raw information into well-structured data To streamline the utilization of the UrbanEV dataset, we harmonize heterogeneous data from various sources into well-structured data with aligned temporal and spatial resolutions. This process can be segmented into two parts: the reorganization of EV charging data and the preparation of other influential factors. EV charging data The raw charging data, obtained from publicly available EV charging services, pertains to charging stations and predominantly comprises string-type records at a 5-minute interval. To transform this raw data into a structured time series tailored for prediction tasks, we implement the following three key measures:

Initial Extraction. From the string-type records, we extract vital information for each charging pile, such as availability (designated as "busy" or "idle"), rated power, and the corresponding charging and service fees applicable during the observed time periods. First, a charging pile is categorized as "active charging" if its states at two consecutive timestamps are both "busy". Consequently, the occupancy within a charging station can be defined as the count of in-use charging piles, while the charging duration is calculated as the product of the count of in-use piles and the time between the two timestamps (in our case, 5 minutes). Moreover, the charging volume in a station can correspondingly be estimated by multiplying the duration by the piles' rated power. Finally, the average electricity price and service price are calculated for each station in alignment with the same temporal resolution as the three charging variables.

Error Detection and Imputation. Ensuring data quality is paramount when utilizing charging data for decision-making, advanced analytics, and machine-learning applications. It is crucial to address concerns around data cleanliness, as the presence of inaccuracies and inconsistencies, often referred to as dirty data, can significantly compromise the reliability and validity of any subsequent analysis or modeling efforts. To improve data quality of our charging data, several errors are identified, particularly the negative values for charging fees and the inconsistencies between the counts of occupied, idle, and total charging piles. We remove the records containing these anomalies and treat them as missing data. Besides that, a two-step imputation process was implemented to address missing values. First, forward filling replaced missing values using data from preceding timestamps. Then, backward filling was applied to fill gaps at the start of each time series. Moreover, a certain number of outliers were identified in the dataset, which could significantly impact prediction performance. To address this, the interquartile range (IQR) method was used to detect outliers for metrics including charging volume (v), charging duration (d), and the rate of active charging piles at the charging station (o). To retain more original data and minimize the impact of outlier correction on the overall data distribution, we set the coefficient to 4 instead of the default 1.5. Finally, each outlier was replaced by the mean of its adjacent valid values. This preprocessing pipeline transformed the raw data into a structured and analyzable dataset.

Aggregation and Filtration. Building upon the station-level charging data that has been extracted and cleansed, we further organize the data into a region-level dataset with an hourly interval providing a new perspective for EV charging behavior analysis. This is achieved by two major processes: aggregation and filtration. First, we aggregate all the charging data from both temporal and spatial views: a. Temporally, we standardize all time-series data to a common time resolution of one hour, as it serves as the least common denominator among the various resolutions. This aims to establish a unified temporal resolution for all time-series data, including pricing schemes, weather records, and charging data, thereby creating a well-structured dataset. Aggregation rules specify that the five-minute charging volume v and duration $(d)$ are summed within each interval (i.e., one hour), whereas the occupancy o, electricity price pe, and service price ps are assigned specific values at certain hours for each charging pile. This distinction arises from the inherent nature of these data types: volume v and duration d are cumulative, while o, pe, and ps are instantaneous variables. Compared to using the mean or median values within each interval, selecting the instantaneous values of o, pe, and ps as representatives preserves the original data patterns more effectively and minimizes the influence of human interpretation. b. Spatially, stations are aggregated based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. After aggregation, our aggregated dataset comprises 331 regions (also called traffic zones) with 4344 timestamps. Second, variance tests and zero-value filtering functions were employed to filter out traffic zones with zero or no change in charging data. Specifically, it means that
f
Data from: S1 Dataset -
plos.figshare.com
bin
Updated Aug 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Winnie Kibone; Felix Bongomin; Jerom Okot; Angel Lisa Nansubuga; Lincoln Abraham Tentena; Edbert Bagasha Nuwamanya; Titus Winyi; Whitney Balirwa; Sarah Kiguli; Joseph Baruch Baluku; Anthony Makhoba; Mark Kaddumukasa (2023). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0289546.s001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289546.s001
Dataset updated
Aug 7, 2023
Dataset provided by
PLOS ONE
Authors
Winnie Kibone; Felix Bongomin; Jerom Okot; Angel Lisa Nansubuga; Lincoln Abraham Tentena; Edbert Bagasha Nuwamanya; Titus Winyi; Whitney Balirwa; Sarah Kiguli; Joseph Baruch Baluku; Anthony Makhoba; Mark Kaddumukasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundRheumatic and musculoskeletal disorders (RMDs) are associated with cardiovascular diseases (CVDs), with hypertension being the most common. We aimed to determine the prevalence of high blood pressure (HBP), awareness, treatment, and blood pressure control among patients with RMDs seen in a Rheumatology clinic in Uganda.MethodsWe conducted a cross-sectional study at the Rheumatology Clinic of Mulago National Referral Hospital (MNRH), Kampala, Uganda. Socio-demographic, clinical characteristics and anthropometric data were collected. Multivariable logistic regression was performed using STATA 16 to determine factors associated with HBP in patients with RMDs.ResultsA total of 100 participants were enrolled. Of these, majority were female (84%, n = 84) with mean age of 52.1 (standard deviation: 13.8) years and median body mass index of 28 kg/m2 (interquartile range (IQR): 24.8 kg/m2–32.9 kg/m2). The prevalence of HBP was 61% (n = 61, 95% CI: 51.5–70.5), with the majority (77%, n = 47, 95% CI: 66.5–87.6) being aware they had HTN. The prevalence of HTN was 47% (n = 47, 37.2–56.8), and none had it under control. Factors independently associated with HBP were age 46-55years (adjusted prevalence ratio (aPR): 2.5, 95% confidence interval (CI): 1.06–5.95), 56–65 years (aPR: 2.6, 95% CI: 1.09–6.15), >65 years (aPR: 2.5, 95% CI: 1.02–6.00), obesity (aPR: 3.7, 95% CI: 1.79–7.52), overweight (aPR: 2.7, 95% CI: 1.29–5.77).ConclusionThere was a high burden of HBP among people with RMDs in Uganda with poor blood pressure control, associated with high BMI and increasing age. There is a need for further assessment of the RMD specific drivers of HBP and meticulous follow up of patients with RMDs.
f
Table 1_Estimated activity levels in dogs at population scale with linear...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
+1more
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
German, Alexander J.; O’Flynn, Ciaran; Butterwick, Richard F.; O’Rourke, Abigail; Lyle, Scott; Haydock, Richard; Carson, Aletha (2025). Table 1_Estimated activity levels in dogs at population scale with linear and causal modeling.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002035976
Explore at:
Dataset updated
Jul 10, 2025
Authors
German, Alexander J.; O’Flynn, Ciaran; Butterwick, Richard F.; O’Rourke, Abigail; Lyle, Scott; Haydock, Richard; Carson, Aletha
Description
IntroductionThe aim of this study was to determine patterns of physical activity in pet dogs using real-world data at a population scale aided by the use of accelerometers and electronic health records (EHRs).MethodsA directed acyclic graph (DAG) was created to capture background knowledge and causal assumptions related to dog activity, and this was used to identify relevant data sources, which included activity data from commercially available accelerometers, and health and patient metadata from the EHRs. Linear mixed models (LMM) were fitted to the number of active minutes following log-transformation with the fixed effects tested based on the variables of interest and the adjustment sets indicated by the DAG.ResultsActivity was recorded on 8,726,606 days for 28,562 dogs with 136,876 associated EHRs, with the median number of activity records per dog being 162 [interquartile range (IQR) 60–390]. The average recorded activity per day of 51 min was much lower than previous estimates of physical activity, and there was wide variation in activity levels from less than 10 to over 600 min per day. Physical activity decreased with age, an effect that was dependent on breed size, whereby there was a greater decline in activity for age as breed size increased. Activity increased with breed size and owner age independently. Activity also varied independently with sex, location, climate, season and day of the week: males were more active than females, and dogs were more active in rural areas, in hot dry or marine climates, in spring, and on weekends.ConclusionAccelerometer-derived activity data gathered from pet dogs living in North America was used to determine associations with both dog and environmental characteristics. Knowledge of these associations could be used to inform daily exercise and caloric requirements for dogs, and how they should be adapted according to individual circumstances.
f
Model input parameters with plausible ranges.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Oct 4, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaignat, Claire-Lise; Schaetti, Christian; Khatib, Ahmed M.; Ali, Said M.; Weiss, Mitchell G.; Hutubessy, Raymond; Reyburn, Rita; Tebbens, Radboud J. Duintjer (2012). Model input parameters with plausible ranges. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001128189
Explore at:
Dataset updated
Oct 4, 2012
Authors
Chaignat, Claire-Lise; Schaetti, Christian; Khatib, Ahmed M.; Ali, Said M.; Weiss, Mitchell G.; Hutubessy, Raymond; Reyburn, Rita; Tebbens, Radboud J. Duintjer
Description
aExcluding costs for international consultants (see Table 5);bEstimates only used in analysis from societal perspective;CI: Confidence intervals, IQR: Interquartile range, ZMO: Zonal medical officer.
VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open...
data.nasa.gov
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/vla-cosmos-survey-324-mhz-continuum-source-catalog
Explore at:
Dataset updated
Sep 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This table contains a source catalog based on 90-cm (324-MHz) Very Large Array (VLA) imaging of the COSMOS field, comprising a circular area of 3.14 square degrees centered on 10^h 00^m 28.6^s, 02^o 12' 21" (J2000.0 RA and Dec). The image from the merger of 3 nights of observations using all 27 VLA antennas had an effective total integration time of ~ 12 hours, an 8.0 arcsecond x 6.0 arcsecond angular resolution, and an average rms of 0.5 mJy beam^-1. The extracted catalog contains 182 sources (down to 5.5 sigma), 30 of which are multi-component sources. Using Monte Carlo artificial source simulations, the authors derive the completeness of the catalog, and show that their 90-cm source counts agree very well with those from previous studies. In their paper, the authors use X-ray, NUV-NIR and radio COSMOS data to investigate the population mix of this 90-cm radio sample, and find that the sample is dominated by active galactic nuclei. The average 90-20 cm spectral index (S_nu~ nu^alpha, where S_nu is the flux density at frequency nu and alpha the spectral index) of the 90-cm selected sources is -0.70, with an interquartile range from -0.90 to -0.53. Only a few ultra-steep-spectrum sources are present in this sample, consistent with results in the literature for similar fields. These data do not show clear steepening of the spectral index with redshift. Nevertheless, this sample suggests that sources with spectral indices steeper than -1 all lie at z >~ 1, in agreement with the idea that ultra-steep-spectrum radio sources may trace intermediate-redshift galaxies (z >~ 1). Using both the signal and rms maps (see Figs. 1 and 2 in the reference paper) as input data, the authors ran the AIPS task SAD to obtain a catalog of candidate components above a given local signal-to-noise ratio (S/N) threshold. The task SAD was run four times with search S/N levels of 10, 8, 6 and 5, using the resulting residual image each time. They recovered all the radio components with a local S/N > 5.00. Subsequently, all the selected components were visually inspected, in order to check their reliability, especially for the components near strong side-lobes. After a careful analysis, a S/N threshold of 5.50 was adopted as the best compromise between a deep and a reliable catalog. The procedure yielded a total of 246 components with a local S/N > 5.50. More than one component, identified in the 90-cm map sometimes belongs to a single radio source (e.g. large radio galaxies consist of multiple components). Using the 90-cm COSMOS radio map, the authors combined the various components into single sources based on visual inspection. The final catalog (contained in this HEASARC table) lists 182 radio sources, 30 of which have been classified as multiple, i.e. they are better described by more than a single component. Moreover, in order to ensure a more precise classification, all sources identified as multi-component sources have been also double-checked using the 20-cm radio map. The authors found that all the 26 multiple 90-cm radio sources within the 20-cm map have 20-cm counterpart sources already classified as multiple. The authors have made use of the VLA-COSMOS Large and Deep Projects over 2 square degrees, reaching down to an rms of ~15 µJy beam¹ ^ at 1.4 GHz and 1.5 arcsec resolution (Schinnerer et al. 2007, ApJS, 172, 46: the VLACOSMOS table in the HEASARC database). The 90-cm COSMOS radio catalog has, however, been extracted from a larger region of 3.14 square degrees (see Fig. 1 and Section 3.1 of the reference paper). This implies that a certain number of 90-cm sources (48) lie outside the area of the 20-cm COSMOS map used to select the radio catalog. Thus, to identify the 20-cm counterparts of the 90-cm radio sources, the authors used the joint VLA-COSMOS catalog (Schinnerer et al. 2010, ApJS, 188, 384: the VLACOSMJSC table in the HEASARC database) for the 134 sources within the 20-cm VLA-COSMOS area and the VLA- FIRST survey (White et al. 1997, ApJ, 475, 479: the FIRST table in the HEASARC database) for the remaining 48 sources. The 90-cm sources were cross-matched with the 20-cm VLA-COSMOS sources using a search radius of 2.5 arcseconds, while the cross-match with the VLA-FIRST sources has been done using a search radius of 4 arcseconds in order to take into account the larger synthesized beam of the VLA-FIRST survey of ~5 arcseconds. Finally, all the 90 cm - 20 cm associations were visually inspected in order to ensure also the association of the multiple 90-cm radio sources for which the value of the search radius used during the cross-match could be too restrictive. In summary, out of the total of 182 sources in the 90-cm catalog, 168 have counterparts at 20 cm. This table was created by the HEASARC in October 2014 based on an electronic version of Table 1 from the reference paper which was obtained from the COSMOS web site at IRSA, specifically the file vla-cosmos_327_sources_published_version.tbl at http://irsa.ipac.caltech.edu/data/COSMOS/tables/vla/. This is a service provided by NASA HEASARC .
Italy: Mobility COVID-19
kaggle.com
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mr. Rahman (2021). Italy: Mobility COVID-19 [Dataset]. https://www.kaggle.com/motiurse/italy-mobility-covid19/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mr. Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy
Description
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.

If you find the data helpful or you use the data for your research, please cite our work:

Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).

The data record is structured into 4 comma-separated value (CSV) files, as follows:

id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:

COD_PROV is an integer field that is used to identify a province in all other data records;

SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);

DEN_PCM is the full name of the province.

OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:

p1: COD_PROV of origin,

p2: COD_PROV of destination,

day: in the format yyyy-mm-dd.

median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:

COD_PROV of the province;

SIGLA of the province;

DEN_PCM of the province;

week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.

week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,

week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,

average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:

COD_PROV of the province;

SIGLA of the province;

DEN_PCM of the province;

day in the format yyyy-mm-dd.

ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Human Activity Recognition Dataset
kaggle.com
zip
Updated Feb 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aruna S (2023). Human Activity Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/arunasivapragasam/human-activity-recognition-dataset
Explore at:
zip(51310476 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
Aruna S
Description
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% for the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time-domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into the body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body l linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern: '-XYZ' is used to denote 3-axial signals in the X, Y, and Z directions.

tBodyAcc-XYZ tGravityAcc-XYZ tBodyAccJerk-XYZ tBodyGyro-XYZ tBodyGyroJerk-XYZ tBodyAccMag tGravityAccMag tBodyAccJerkMag tBodyGyroMag tBodyGyroJerkMag fBodyAcc-XYZ fBodyAccJerk-XYZ fBodyGyro-XYZ fBodyAccMag fBodyAccJerkMag fBodyGyroMag fBodyGyroJerkMag

The set of variables that were estimated from these signals are:

mean(): Mean value std(): Standard deviation mad(): Median absolute deviation max(): Largest value in array min(): Smallest value in array sma(): Signal magnitude area energy(): Energy measure. Sum of the squares divided by the number of values. iqr(): Interquartile range entropy(): Signal entropy arCoeff(): Autorregresion coefficients with Burg order equal to 4 correlation(): correlation coefficient between two signals maxInds(): index of the frequency component with the largest magnitude meanFreq(): Weighted average of the frequency components to obtain a mean frequency skewness(): skewness of the frequency domain signal kurtosis(): kurtosis of the frequency domain signal bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window. angle(): Angle between two vectors.

Additional vectors are obtained by averaging the signals in a signal window sample. These are used on the angle() variable:

gravityMean tBodyAccMean tBodyAccJerkMean tBodyGyroMean tBodyGyroJerkMean

This data set consists of the following columns:

1 tBodyAcc-mean()-X 2 tBodyAcc-mean()-Y 3 tBodyAcc-mean()-Z 4 tBodyAcc-std()-X 5 tBodyAcc-std()-Y 6 tBodyAcc-std()-Z 7 tBodyAcc-mad()-X 8 tBodyAcc-mad()-Y 9 tBodyAcc-mad()-Z 10 tBodyAcc-max()-X 11 tBodyAcc-max()-Y 12 tBodyAcc-max()-Z 13 tBodyAcc-min()-X 14 tBodyAcc-min()-Y 15 tBodyAcc-min()-Z 16 tBodyAcc-sma() 17 tBodyAcc-energy()-X 18 tBodyAcc-energy()-Y 19 tBodyAcc-energy()-Z 20 tBodyAcc-iqr()-X 21 tBodyAcc-iqr()-Y 22 tBodyAcc-iqr()-Z 23 tBodyAcc-entropy()-X 24 tBodyAcc-entropy()-Y 25 tBodyAcc-entropy()-Z 26 tBodyAcc-arCoeff()-X,1 27 tBodyAcc-arCoeff()-X,2 28 tBodyAcc-arCoeff()-X,3 29 tBodyAcc-arCoeff()-X,4 30 tBodyAcc-arCoeff()-Y,1 31 tBodyAcc-arCoeff()-Y,2 32 tBodyAcc-arCoeff()-Y,3 33 tBodyAcc-arCoeff()-Y,4 34 tBodyAcc-arCoeff()-Z,1 35 tBodyAcc-arCoeff()-Z,2 36 tBodyAcc-arCoeff()-Z,3 37 tBodyAcc-arCoeff()-Z,4 38 tBodyAcc-correlation()-X,Y 39 tBodyAcc-correlation()-X,Z 40 tBodyAcc-correlation()-Y,Z 41 tGravityAcc-mean()-X 42 tGravit...
Gender, Age, and Emotion Detection from Voice
kaggle.com
zip
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/rohitzaman/gender-age-and-emotion-detection-from-voice
Explore at:
zip(967820 bytes)Available download formats
Dataset updated
May 29, 2021
Authors
Rohit Zaman
Description
Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Dataset of WiFi-based Environment-independent In-baggage Object...
zenodo.org
zip
Updated Feb 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen; Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen (2023). Dataset of WiFi-based Environment-independent In-baggage Object Identification System [Dataset]. http://doi.org/10.5281/zenodo.7631168
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7631168
Dataset updated
Feb 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen; Cong Shi; Tianming Zhao; Yucheng Xie; Tianfang Zhang; Yan Wang; Xiaonan Guo; Yingying Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:

The dataset of environment-independent in-baggage object identification system leveraging low-cost WiFi. The dataset contains the extracted CSI features from 14 representative in-baggage objects of 4 different materials. The experiments are conducted in 3 different office environments with different sizes. We hope this dataset will help researchers to reproduce the former work of in-baggage object identification through WiFi sensing.

Dataset Format:

.mat files

Section 1: Device Configuration:

Transmitter: Aaronia HyperLOG 7060 direction antenna with a Dell Inspiron 3910 desktop for control.

Receiver: Hawking HD9DP orthogonal antennas with a Dell Inspiron 3910 desktop for control

NIC: Atheros QCA9590. The configuration and installation guide of CSI tool can be found at https://wands.sg/research/wifi/AtherosCSI/

WiFi Packet Rate: 1000 pkts/s

Section 2: Data Format

We provide the CSI features through .mat files. The details are shown in the following:

14 different objects made of 4 different materials are included in 3 different environments and 3 different days.

Each object is tested for 60 seconds and repeated for 3 times.

The dataset file name is presented as "Object_Number". The detailed information are:

Object: The object we involved in the experiment (e.g., book, laptop)

Number: The number of repeats.

Section 3: Experimental Setups

There are 3 different office experiment setups for our data collection. The detailed setups are shown in the paper. For the objects, we involve 14 types of objects made of 4 different materials.

Environments:

3 different environments are involved, including 3 office environments with the size of 15 ft × 13 ft, 16 ft × 12 ft, 28 ft × 23 ft, respectively.

For each room environment, data is collected on different days and with different furniture settings (i.e., 2 desks and 2 chairs are moved at least 3 ft. )

Representative objects:

Data is collected using 14 representative objects of 4 different materials including fiber: book, magazine, newspaper; metal: thermal cup, laptop; cotton/polyester: cotton T-shirts (×2), cotton T-shirts (×4), hoodie, polyester T-shirts, polyester pants; water: 1L bottle with 1L water, 1L bottle with 500ml water, 500ml bottle with 500ml water.

Section 4: Data Description

For our data organization, we separate the data files into different folders based on different days and different environments. Under these folders, data are further distributed in terms of different objects and repeat times. All the files are .mat files, which can be directly read for further applications.

Features of CSI amplitude: We calculate 7 different types of statistical features, including mean, variance, median, skewness, kurtosis, interquartile range and range, and polarization feature from CSI amplitude. Particularly, we calculate the features for all 56 subcarriers with different operating frequencies and responses to the target object.

Features of CSI phase: For the features of CSI phase, the same features with CSI amplitude are extracted and stored in the dataset.

Section 6: Citations

If your work is related to our work, please cite our papers as follows.

https://ieeexplore.ieee.org/document/9637801

Shi, Cong, Tianming Zhao, Yucheng Xie, Tianfang Zhang, Yan Wang, Xiaonan Guo, and Yingying Chen. "Environment-independent in-baggage object identification using wifi signals." In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp. 71-79. IEEE, 2021.
f
Selected variables associated with intention to get seasonal influenza...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 27, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chuang, Jen-Hsiang; Wang, Da-Wei; Chan, Ta-Chien; Fu, Yang-chih (2014). Selected variables associated with intention to get seasonal influenza vaccines. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001236220
Explore at:
Dataset updated
Jun 27, 2014
Authors
Chuang, Jen-Hsiang; Wang, Da-Wei; Chan, Ta-Chien; Fu, Yang-chih
Description
IQR interquartile range (25th percentile–75th percentile).
f
Proportion of MS patients and HC in different risk categories and median...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated May 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dobson, Ruth; Adiutori, Rocco; Elangovan, Ramyiadarsini I.; Kuhle, Jens; Giovannoni, Gavin; Pakpoor, Julia; Disanto, Giulio (2014). Proportion of MS patients and HC in different risk categories and median wGRS with interquartile range (IQR) of MS and HC in each model (only HLA-DRB1, HLA-DRB1 + MS associations known in 2011 and HLA-DRB1 + all currently known MS associations). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001213967
Explore at:
Dataset updated
May 2, 2014
Authors
Dobson, Ruth; Adiutori, Rocco; Elangovan, Ramyiadarsini I.; Kuhle, Jens; Giovannoni, Gavin; Pakpoor, Julia; Disanto, Giulio
Description
In brief, the genotype at MS associated loci was used to assign each individual to the categories of risk identified using the simulated population from table 1. Furthermore, a weighted genetic risk (wGRS) was calculated by multiplying the number of risk alleles by the weight of each SNP and then taking the sum across all associations (see methods).
f
Values for Estimated Energy Expenditure and Time Spent in Simultaneous...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jul 28, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bailey, Ryan R.; Klaesner, Joseph W.; Lang, Catherine E. (2014). Values for Estimated Energy Expenditure and Time Spent in Simultaneous Activity for Each Task. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001190056
Explore at:
Dataset updated
Jul 28, 2014
Authors
Bailey, Ryan R.; Klaesner, Joseph W.; Lang, Catherine E.
Description
n = number of observations for each task, see Methods.†As measured by MET values.Abbreviations: IQR, interquartile range.
f
Baseline characteristics of the study population.*
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 20, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rotger, Margalida; Marzolini, Catia; Bucher, Heiner C.; Hirschel, Bernard; Günthard, Huldrych F.; Bernasconi, Enos; Decosterd, Laurent; Rickenbach, Martin; Nicca, Dunja; Csajka, Chantal; Glass, Tracy R.; Battegay, Manuel; Wandeler, Gilles; Telenti, Amalio (2013). Baseline characteristics of the study population.* [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001688026
Explore at:
Dataset updated
Feb 20, 2013
Authors
Rotger, Margalida; Marzolini, Catia; Bucher, Heiner C.; Hirschel, Bernard; Günthard, Huldrych F.; Bernasconi, Enos; Decosterd, Laurent; Rickenbach, Martin; Nicca, Dunja; Csajka, Chantal; Glass, Tracy R.; Battegay, Manuel; Wandeler, Gilles; Telenti, Amalio
Description
*Baseline is the beginning of the poor adherence period which was defined as 2 consecutive self-reports of missed doses at least 1 per week over at least a 12 week period.√See Methods section for definition.¶Patients suppressed on the current regimen at baseline and throughout the study period.IQR = interquartile range, SD = standard deviation, BMI = body max index, IDU = injecting drug use.3TC = lamivudine, ABC = abacavir, AZT = zidovudine, d4T = stavudine, DDI = didanosine, FTC = emtricitabine, TDF = tenofovir.
f
Table_1_Preserved C-peptide is common and associated with higher time in...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Han, Xueyao; Ji, Linong; Fang, Yayu; Cai, Deheng; Li, Juan; Zhang, Mingxia; Wang, Lei; Chen, Jing; Liu, Wei; Shi, Dawei; Yin, Sai; Cai, Xiaoling; Zhu, Yu (2024). Table_1_Preserved C-peptide is common and associated with higher time in range in Chinese type 1 diabetes.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424147
Explore at:
Dataset updated
Feb 9, 2024
Authors
Han, Xueyao; Ji, Linong; Fang, Yayu; Cai, Deheng; Li, Juan; Zhang, Mingxia; Wang, Lei; Chen, Jing; Liu, Wei; Shi, Dawei; Yin, Sai; Cai, Xiaoling; Zhu, Yu
Description
ObjectiveThe aim of this study is to determine the residual C-peptide level and to explore the clinical significance of preserved C-peptide secretion in glycemic control in Chinese individuals with type 1 diabetes (T1D).Research design and methodsA total of 534 participants with T1D were enrolled and divided into two groups, low–C-peptide group (fasting C-peptide ≤10 pmol/L) and preserved–C-peptide group (fasting C-peptide >10 pmol/L), and clinical factors were compared between the two groups. In 174 participants who were followed, factors associated with C-peptide loss were also identified by Cox regression. In addition, glucose metrics derived from intermittently scanned continuous glucose monitoring were compared between individuals with low C-peptide and those with preserved C-peptide in 178 participants.ResultsThe lack of preserved C-peptide was associated with longer diabetes duration, glutamic acid decarboxylase autoantibody, and higher daily insulin doses, after adjustment {OR, 1.10 [interquartile range (IQR), 1.06–1.14]; OR, 0.46 (IQR, 0.27–0.77); OR, 1.04 (IQR, 1.02–1.06)}. In the longitudinal analysis, the percentages of individuals with preserved C-peptide were 71.4%, 56.8%, 71.7%, 62.5%, and 22.2% over 5 years of follow-up. Preserved C-peptide was also associated with higher time in range after adjustment of diabetes duration [62.4 (IQR, 47.3–76.6) vs. 50.3 (IQR, 36.2–63.0) %, adjusted P = 0.003].ConclusionsOur results indicate that a high proportion of Chinese patients with T1D had preserved C-peptide secretion. Meanwhile, residual C-peptide was associated with favorable glycemic control, suggesting the importance of research on adjunctive therapy to maintain β-cell function in T1D.
f
Diagnostic accuracy of a rapid RT-PCR assay for point-of-care detection of...
plos.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated May 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Maignan; Damien Viglino; Maud Hablot; Nicolas Termoz Masson; Anne Lebeugle; Roselyne Collomb Muret; Prudence Mabiala Makele; Valérie Guglielmetti; Patrice Morand; Julien Lupo; Virginie Forget; Caroline Landelle; Sylvie Larrat (2019). Diagnostic accuracy of a rapid RT-PCR assay for point-of-care detection of influenza A/B virus at emergency department admission: A prospective evaluation during the 2017/2018 influenza season [Dataset]. http://doi.org/10.1371/journal.pone.0216308
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0216308
Dataset updated
May 7, 2019
Dataset provided by
PLOS ONE
Authors
Maxime Maignan; Damien Viglino; Maud Hablot; Nicolas Termoz Masson; Anne Lebeugle; Roselyne Collomb Muret; Prudence Mabiala Makele; Valérie Guglielmetti; Patrice Morand; Julien Lupo; Virginie Forget; Caroline Landelle; Sylvie Larrat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Study objectiveTo investigate the performance of a rapid RT-PCR assay to detect influenza A/B at emergency department admission.MethodsThis single-center prospective study recruited adult patients attending the emergency department for influenza-like illness. Triage nurses performed nasopharyngeal swab samples and ran rapid RT-PCR assays using a dedicated device (cobas Liat, Roche Diagnostics, Meylan, France) located at triage. The same swab sample was also analyzed in the department of virology using conventional RT-PCR techniques. Patients were included 24 hours-a-day, 7 days-a-week. The primary outcome was the diagnostic accuracy of the rapid RT-PCR assay performed at triage.ResultsA total of 187 patients were included over 11 days in January 2018. Median age was 70 years (interquartile range 44 to 84) and 95 (51%) were male. Nine (5%) assays had to be repeated due to failure of the first assay. The sensitivity of the rapid RT-PCR assay performed at triage was 0.98 (95% confidence interval (CI): 0.91–1.00) and the specificity was 0.99 (95% CI: 0.94–1.00). A total of 92 (49%) assays were performed at night-time or during the weekend. The median time from patient entry to rapid RT-PCR assay results was 46 [interquartile range 36–55] minutes.ConclusionRapid RT-PCR assay performed by nurses at triage to detect influenza A/B is feasible and highly accurate.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set

Simulation Data Set

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Clear search

Close search

Google apps

Main menu

Simulation Data Set

Meta data and supporting documentation

Walmart Stocks Data 2025

📊 Walmart Stock Price Dataset & Exploratory Data Analysis (EDA)

🏢 About Walmart

📌 Dataset Overview

📊 Features Included in the Dataset

🔍 Exploratory Data Analysis (EDA) Steps

1️⃣ Data Preprocessing & Cleaning

2️⃣ Descriptive Statistics & Summary

3️⃣ Data Visualizations

4️⃣ Time Series Analysis

5️⃣ Insights & Conclusions

🚀 Use Cases & Applications

Data from: S1 Dataset -

Time Series Data of Carbon Monoxide Concentrations

Data from: Urbanev: An open benchmark dataset for urban electric vehicle...

Data from: S1 Dataset -

Table 1_Estimated activity levels in dogs at population scale with linear...

Model input parameters with plausible ranges.

VLA-COSMOS Survey 324-MHz Continuum Source Catalog - Dataset - NASA Open...

Italy: Mobility COVID-19

Human Activity Recognition Dataset

Gender, Age, and Emotion Detection from Voice

Context

Content

Acknowledgements

Dataset of WiFi-based Environment-independent In-baggage Object...

Selected variables associated with intention to get seasonal influenza...

Proportion of MS patients and HC in different risk categories and median...

Values for Estimated Energy Expenditure and Time Spent in Simultaneous...

Baseline characteristics of the study population.*

Table_1_Preserved C-peptide is common and associated with higher time in...

Diagnostic accuracy of a rapid RT-PCR assay for point-of-care detection of...

Simulation Data Set