Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterWe include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.
This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.
This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:
This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.
📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundTuberculosis (TB) remains a significant public health challenge, particularly among vulnerable populations like children. This is especially true in Sub-Saharan Africa, where the burden of TB in children is substantial. Zambia ranks 21st among the top 30 high TB endemic countries globally. While studies have explored TB in adults in Zambia, the prevalence and associated factors in children are not well documented. This study aimed to determine the prevalence and sociodemographic, and clinical factors associated with active TB disease in hospitalized children under the age of 15 years at Livingstone University Teaching Hospital (LUTH), the largest referral center in Zambia’s Southern Province.MethodsThis retrospective cross-sectional study of 700 pediatric patients under 15 years old, utilized programmatic data from the Pediatrics Department at LUTH. A systematic sampling method was used to select participants from medical records. Data on demographics, medical conditions, anthropometric measurements, and blood tests were collected. Data analysis included descriptive statistics, chi-square tests, and multivariable logistic regression to identify factors associated with TB.ResultsThe median age was 24 months (interquartile range (IQR): 11, 60) and majority were male (56.7%, n = 397/700). Most participants were from urban areas (59.9%, n = 419/700), and 9.2% (n = 62/675) were living with HIV. Malnutrition and comorbidities were present in a significant portion of the participants (19.0% and 25.1%, respectively). The prevalence of active TB cases was 9.4% (n = 66/700) among hospitalized children. Persons living with HIV (Adjusted odds ratio (AOR) of 6.30; 95% confidence interval (CI) of 2.85, 13.89, p< 0.001), and those who were malnourished (AOR: 10.38, 95% CI: 4.78, 22.55, p< 0.001) had a significantly higher likelihood of developing active TB disease.ConclusionThis study revealed a prevalence 9.4% active TB among hospitalized children under 15 years at LUTH. HIV status and malnutrition emerged as significant factors associated with active TB disease. These findings emphasize the need for pediatric TB control strategies that prioritize addressing associated factors to effectively reduce the burden of tuberculosis in Zambian children.
Facebook
TwitterThe dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.
The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer
CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.
CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**
Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.
Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The recent surge in electric vehicles (EVs), driven by a collective push to enhance global environmental sustainability, has underscored the significance of exploring EV charging prediction. To catalyze further research in this domain, we introduce UrbanEV—an open dataset showcasing EV charging space availability and electricity consumption in a pioneering city for vehicle electrification, namely Shenzhen, China. UrbanEV offers a rich repository of charging data (i.e., charging occupancy, duration, volume, and price) captured at hourly intervals across an extensive six-month span for over 20,000 individual charging stations. Beyond these core attributes, the dataset also encompasses diverse influencing factors like weather conditions and spatial proximity. These factors are thoroughly analyzed qualitatively and quantitatively to reveal their correlations and causal impacts on charging behaviors. Furthermore, comprehensive experiments have been conducted to showcase the predictive capabilities of various models, including statistical, deep learning, and transformer-based approaches, using the UrbanEV dataset. This dataset is poised to propel advancements in EV charging prediction and management, positioning itself as a benchmark resource within this burgeoning field. Methods To build a comprehensive and reliable benchmark dataset, we conduct a series of rigorous processes from data collection to dataset evaluation. The overall workflow sequentially includes data acquisition, data processing, statistical analysis, and prediction assessment. As follows, please see detailed descriptions. Study area and data acquisition
Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, EV charging data was automatically collected from a mobile platform used by EV drivers to locate public charging stations. Through this platform, users could access real-time information on each charging pile, including its availability (e.g., busy or idle), charging price, and geographic coordinates. Accordingly, we recorded the charging-related data at five-minute intervals from September 1, 2022, to February 28, 2023. This data collection process was fully digital and did not require manual readings. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city were acquired from two meteorological observatories situated in the airport and central regions, respectively. These meteorological data are publicly available on the Shenzhen Government Data Open Platform. Thirdly, point of interest (POI) data was extracted through the Application Programming Interface Platform of AMap.com, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions.
Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, a program was employed to extract the status (e.g., busy or idle, charging price, electricity volume, and coordinates) of each charging pile at five-minute intervals from 1 September 2022 to 28 February 2023. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city was acquired from two meteorological observatories situated in the airport and central regions, respectively. Thirdly, point of interest (POI) data was extracted, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions. Processing raw information into well-structured data To streamline the utilization of the UrbanEV dataset, we harmonize heterogeneous data from various sources into well-structured data with aligned temporal and spatial resolutions. This process can be segmented into two parts: the reorganization of EV charging data and the preparation of other influential factors. EV charging data The raw charging data, obtained from publicly available EV charging services, pertains to charging stations and predominantly comprises string-type records at a 5-minute interval. To transform this raw data into a structured time series tailored for prediction tasks, we implement the following three key measures:
Initial Extraction. From the string-type records, we extract vital information for each charging pile, such as availability (designated as "busy" or "idle"), rated power, and the corresponding charging and service fees applicable during the observed time periods. First, a charging pile is categorized as "active charging" if its states at two consecutive timestamps are both "busy". Consequently, the occupancy within a charging station can be defined as the count of in-use charging piles, while the charging duration is calculated as the product of the count of in-use piles and the time between the two timestamps (in our case, 5 minutes). Moreover, the charging volume in a station can correspondingly be estimated by multiplying the duration by the piles' rated power. Finally, the average electricity price and service price are calculated for each station in alignment with the same temporal resolution as the three charging variables.
Error Detection and Imputation. Ensuring data quality is paramount when utilizing charging data for decision-making, advanced analytics, and machine-learning applications. It is crucial to address concerns around data cleanliness, as the presence of inaccuracies and inconsistencies, often referred to as dirty data, can significantly compromise the reliability and validity of any subsequent analysis or modeling efforts. To improve data quality of our charging data, several errors are identified, particularly the negative values for charging fees and the inconsistencies between the counts of occupied, idle, and total charging piles. We remove the records containing these anomalies and treat them as missing data. Besides that, a two-step imputation process was implemented to address missing values. First, forward filling replaced missing values using data from preceding timestamps. Then, backward filling was applied to fill gaps at the start of each time series. Moreover, a certain number of outliers were identified in the dataset, which could significantly impact prediction performance. To address this, the interquartile range (IQR) method was used to detect outliers for metrics including charging volume (v), charging duration (d), and the rate of active charging piles at the charging station (o). To retain more original data and minimize the impact of outlier correction on the overall data distribution, we set the coefficient to 4 instead of the default 1.5. Finally, each outlier was replaced by the mean of its adjacent valid values. This preprocessing pipeline transformed the raw data into a structured and analyzable dataset.
Aggregation and Filtration. Building upon the station-level charging data that has been extracted and cleansed, we further organize the data into a region-level dataset with an hourly interval providing a new perspective for EV charging behavior analysis. This is achieved by two major processes: aggregation and filtration. First, we aggregate all the charging data from both temporal and spatial views: a. Temporally, we standardize all time-series data to a common time resolution of one hour, as it serves as the least common denominator among the various resolutions. This aims to establish a unified temporal resolution for all time-series data, including pricing schemes, weather records, and charging data, thereby creating a well-structured dataset. Aggregation rules specify that the five-minute charging volume v and duration $(d)$ are summed within each interval (i.e., one hour), whereas the occupancy o, electricity price pe, and service price ps are assigned specific values at certain hours for each charging pile. This distinction arises from the inherent nature of these data types: volume v and duration d are cumulative, while o, pe, and ps are instantaneous variables. Compared to using the mean or median values within each interval, selecting the instantaneous values of o, pe, and ps as representatives preserves the original data patterns more effectively and minimizes the influence of human interpretation. b. Spatially, stations are aggregated based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. After aggregation, our aggregated dataset comprises 331 regions (also called traffic zones) with 4344 timestamps. Second, variance tests and zero-value filtering functions were employed to filter out traffic zones with zero or no change in charging data. Specifically, it means that
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundRheumatic and musculoskeletal disorders (RMDs) are associated with cardiovascular diseases (CVDs), with hypertension being the most common. We aimed to determine the prevalence of high blood pressure (HBP), awareness, treatment, and blood pressure control among patients with RMDs seen in a Rheumatology clinic in Uganda.MethodsWe conducted a cross-sectional study at the Rheumatology Clinic of Mulago National Referral Hospital (MNRH), Kampala, Uganda. Socio-demographic, clinical characteristics and anthropometric data were collected. Multivariable logistic regression was performed using STATA 16 to determine factors associated with HBP in patients with RMDs.ResultsA total of 100 participants were enrolled. Of these, majority were female (84%, n = 84) with mean age of 52.1 (standard deviation: 13.8) years and median body mass index of 28 kg/m2 (interquartile range (IQR): 24.8 kg/m2–32.9 kg/m2). The prevalence of HBP was 61% (n = 61, 95% CI: 51.5–70.5), with the majority (77%, n = 47, 95% CI: 66.5–87.6) being aware they had HTN. The prevalence of HTN was 47% (n = 47, 37.2–56.8), and none had it under control. Factors independently associated with HBP were age 46-55years (adjusted prevalence ratio (aPR): 2.5, 95% confidence interval (CI): 1.06–5.95), 56–65 years (aPR: 2.6, 95% CI: 1.09–6.15), >65 years (aPR: 2.5, 95% CI: 1.02–6.00), obesity (aPR: 3.7, 95% CI: 1.79–7.52), overweight (aPR: 2.7, 95% CI: 1.29–5.77).ConclusionThere was a high burden of HBP among people with RMDs in Uganda with poor blood pressure control, associated with high BMI and increasing age. There is a need for further assessment of the RMD specific drivers of HBP and meticulous follow up of patients with RMDs.
Facebook
TwitterIntroductionThe aim of this study was to determine patterns of physical activity in pet dogs using real-world data at a population scale aided by the use of accelerometers and electronic health records (EHRs).MethodsA directed acyclic graph (DAG) was created to capture background knowledge and causal assumptions related to dog activity, and this was used to identify relevant data sources, which included activity data from commercially available accelerometers, and health and patient metadata from the EHRs. Linear mixed models (LMM) were fitted to the number of active minutes following log-transformation with the fixed effects tested based on the variables of interest and the adjustment sets indicated by the DAG.ResultsActivity was recorded on 8,726,606 days for 28,562 dogs with 136,876 associated EHRs, with the median number of activity records per dog being 162 [interquartile range (IQR) 60–390]. The average recorded activity per day of 51 min was much lower than previous estimates of physical activity, and there was wide variation in activity levels from less than 10 to over 600 min per day. Physical activity decreased with age, an effect that was dependent on breed size, whereby there was a greater decline in activity for age as breed size increased. Activity increased with breed size and owner age independently. Activity also varied independently with sex, location, climate, season and day of the week: males were more active than females, and dogs were more active in rural areas, in hot dry or marine climates, in spring, and on weekends.ConclusionAccelerometer-derived activity data gathered from pet dogs living in North America was used to determine associations with both dog and environmental characteristics. Knowledge of these associations could be used to inform daily exercise and caloric requirements for dogs, and how they should be adapted according to individual circumstances.
Facebook
TwitteraExcluding costs for international consultants (see Table 5);bEstimates only used in analysis from societal perspective;CI: Confidence intervals, IQR: Interquartile range, ZMO: Zonal medical officer.
Facebook
TwitterThis table contains a source catalog based on 90-cm (324-MHz) Very Large Array (VLA) imaging of the COSMOS field, comprising a circular area of 3.14 square degrees centered on 10h 00m 28.6s, 02o 12' 21" (J2000.0 RA and Dec). The image from the merger of 3 nights of observations using all 27 VLA antennas had an effective total integration time of ~ 12 hours, an 8.0 arcsecond x 6.0 arcsecond angular resolution, and an average rms of 0.5 mJy beam-1. The extracted catalog contains 182 sources (down to 5.5 sigma), 30 of which are multi-component sources. Using Monte Carlo artificial source simulations, the authors derive the completeness of the catalog, and show that their 90-cm source counts agree very well with those from previous studies. In their paper, the authors use X-ray, NUV-NIR and radio COSMOS data to investigate the population mix of this 90-cm radio sample, and find that the sample is dominated by active galactic nuclei. The average 90-20 cm spectral index (S_nu~ nualpha, where Snu is the flux density at frequency nu and alpha the spectral index) of the 90-cm selected sources is -0.70, with an interquartile range from -0.90 to -0.53. Only a few ultra-steep-spectrum sources are present in this sample, consistent with results in the literature for similar fields. These data do not show clear steepening of the spectral index with redshift. Nevertheless, this sample suggests that sources with spectral indices steeper than -1 all lie at z >~ 1, in agreement with the idea that ultra-steep-spectrum radio sources may trace intermediate-redshift galaxies (z >~ 1). Using both the signal and rms maps (see Figs. 1 and 2 in the reference paper) as input data, the authors ran the AIPS task SAD to obtain a catalog of candidate components above a given local signal-to-noise ratio (S/N) threshold. The task SAD was run four times with search S/N levels of 10, 8, 6 and 5, using the resulting residual image each time. They recovered all the radio components with a local S/N > 5.00. Subsequently, all the selected components were visually inspected, in order to check their reliability, especially for the components near strong side-lobes. After a careful analysis, a S/N threshold of 5.50 was adopted as the best compromise between a deep and a reliable catalog. The procedure yielded a total of 246 components with a local S/N > 5.50. More than one component, identified in the 90-cm map sometimes belongs to a single radio source (e.g. large radio galaxies consist of multiple components). Using the 90-cm COSMOS radio map, the authors combined the various components into single sources based on visual inspection. The final catalog (contained in this HEASARC table) lists 182 radio sources, 30 of which have been classified as multiple, i.e. they are better described by more than a single component. Moreover, in order to ensure a more precise classification, all sources identified as multi-component sources have been also double-checked using the 20-cm radio map. The authors found that all the 26 multiple 90-cm radio sources within the 20-cm map have 20-cm counterpart sources already classified as multiple. The authors have made use of the VLA-COSMOS Large and Deep Projects over 2 square degrees, reaching down to an rms of ~15 µJy beam1 ^ at 1.4 GHz and 1.5 arcsec resolution (Schinnerer et al. 2007, ApJS, 172, 46: the VLACOSMOS table in the HEASARC database). The 90-cm COSMOS radio catalog has, however, been extracted from a larger region of 3.14 square degrees (see Fig. 1 and Section 3.1 of the reference paper). This implies that a certain number of 90-cm sources (48) lie outside the area of the 20-cm COSMOS map used to select the radio catalog. Thus, to identify the 20-cm counterparts of the 90-cm radio sources, the authors used the joint VLA-COSMOS catalog (Schinnerer et al. 2010, ApJS, 188, 384: the VLACOSMJSC table in the HEASARC database) for the 134 sources within the 20-cm VLA-COSMOS area and the VLA- FIRST survey (White et al. 1997, ApJ, 475, 479: the FIRST table in the HEASARC database) for the remaining 48 sources. The 90-cm sources were cross-matched with the 20-cm VLA-COSMOS sources using a search radius of 2.5 arcseconds, while the cross-match with the VLA-FIRST sources has been done using a search radius of 4 arcseconds in order to take into account the larger synthesized beam of the VLA-FIRST survey of ~5 arcseconds. Finally, all the 90 cm - 20 cm associations were visually inspected in order to ensure also the association of the multiple 90-cm radio sources for which the value of the search radius used during the cross-match could be too restrictive. In summary, out of the total of 182 sources in the 90-cm catalog, 168 have counterparts at 20 cm. This table was created by the HEASARC in October 2014 based on an electronic version of Table 1 from the reference paper which was obtained from the COSMOS web site at IRSA, specifically the file vla-cosmos_327_sources_published_version.tbl at http://irsa.ipac.caltech.edu/data/COSMOS/tables/vla/. This is a service provided by NASA HEASARC .
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.
If you find the data helpful or you use the data for your research, please cite our work:
Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).
The data record is structured into 4 comma-separated value (CSV) files, as follows:
id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:
COD_PROV is an integer field that is used to identify a province in all other data records;
SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);
DEN_PCM is the full name of the province.
OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:
p1: COD_PROV of origin,
p2: COD_PROV of destination,
day: in the format yyyy-mm-dd.
median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.
week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,
week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,
average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
day in the format yyyy-mm-dd.
ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Facebook
TwitterThe experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% for the test data.
The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low-frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.
The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time-domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into the body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.
Subsequently, the body l linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).
Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).
These signals were used to estimate variables of the feature vector for each pattern: '-XYZ' is used to denote 3-axial signals in the X, Y, and Z directions.
tBodyAcc-XYZ tGravityAcc-XYZ tBodyAccJerk-XYZ tBodyGyro-XYZ tBodyGyroJerk-XYZ tBodyAccMag tGravityAccMag tBodyAccJerkMag tBodyGyroMag tBodyGyroJerkMag fBodyAcc-XYZ fBodyAccJerk-XYZ fBodyGyro-XYZ fBodyAccMag fBodyAccJerkMag fBodyGyroMag fBodyGyroJerkMag
The set of variables that were estimated from these signals are:
mean(): Mean value std(): Standard deviation mad(): Median absolute deviation max(): Largest value in array min(): Smallest value in array sma(): Signal magnitude area energy(): Energy measure. Sum of the squares divided by the number of values. iqr(): Interquartile range entropy(): Signal entropy arCoeff(): Autorregresion coefficients with Burg order equal to 4 correlation(): correlation coefficient between two signals maxInds(): index of the frequency component with the largest magnitude meanFreq(): Weighted average of the frequency components to obtain a mean frequency skewness(): skewness of the frequency domain signal kurtosis(): kurtosis of the frequency domain signal bandsEnergy(): Energy of a frequency interval within the 64 bins of the FFT of each window. angle(): Angle between two vectors.
Additional vectors are obtained by averaging the signals in a signal window sample. These are used on the angle() variable:
gravityMean tBodyAccMean tBodyAccJerkMean tBodyGyroMean tBodyGyroJerkMean
This data set consists of the following columns:
1 tBodyAcc-mean()-X 2 tBodyAcc-mean()-Y 3 tBodyAcc-mean()-Z 4 tBodyAcc-std()-X 5 tBodyAcc-std()-Y 6 tBodyAcc-std()-Z 7 tBodyAcc-mad()-X 8 tBodyAcc-mad()-Y 9 tBodyAcc-mad()-Z 10 tBodyAcc-max()-X 11 tBodyAcc-max()-Y 12 tBodyAcc-max()-Z 13 tBodyAcc-min()-X 14 tBodyAcc-min()-Y 15 tBodyAcc-min()-Z 16 tBodyAcc-sma() 17 tBodyAcc-energy()-X 18 tBodyAcc-energy()-Y 19 tBodyAcc-energy()-Z 20 tBodyAcc-iqr()-X 21 tBodyAcc-iqr()-Y 22 tBodyAcc-iqr()-Z 23 tBodyAcc-entropy()-X 24 tBodyAcc-entropy()-Y 25 tBodyAcc-entropy()-Z 26 tBodyAcc-arCoeff()-X,1 27 tBodyAcc-arCoeff()-X,2 28 tBodyAcc-arCoeff()-X,3 29 tBodyAcc-arCoeff()-X,4 30 tBodyAcc-arCoeff()-Y,1 31 tBodyAcc-arCoeff()-Y,2 32 tBodyAcc-arCoeff()-Y,3 33 tBodyAcc-arCoeff()-Y,4 34 tBodyAcc-arCoeff()-Z,1 35 tBodyAcc-arCoeff()-Z,2 36 tBodyAcc-arCoeff()-Z,3 37 tBodyAcc-arCoeff()-Z,4 38 tBodyAcc-correlation()-X,Y 39 tBodyAcc-correlation()-X,Z 40 tBodyAcc-correlation()-Y,Z 41 tGravityAcc-mean()-X 42 tGravit...
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
The dataset of environment-independent in-baggage object identification system leveraging low-cost WiFi. The dataset contains the extracted CSI features from 14 representative in-baggage objects of 4 different materials. The experiments are conducted in 3 different office environments with different sizes. We hope this dataset will help researchers to reproduce the former work of in-baggage object identification through WiFi sensing.
Dataset Format:
.mat files
Section 1: Device Configuration:
Transmitter: Aaronia HyperLOG 7060 direction antenna with a Dell Inspiron 3910 desktop for control.
Receiver: Hawking HD9DP orthogonal antennas with a Dell Inspiron 3910 desktop for control
NIC: Atheros QCA9590. The configuration and installation guide of CSI tool can be found at https://wands.sg/research/wifi/AtherosCSI/
WiFi Packet Rate: 1000 pkts/s
Section 2: Data Format
We provide the CSI features through .mat files. The details are shown in the following:
14 different objects made of 4 different materials are included in 3 different environments and 3 different days.
Each object is tested for 60 seconds and repeated for 3 times.
The dataset file name is presented as "Object_Number". The detailed information are:
Object: The object we involved in the experiment (e.g., book, laptop)
Number: The number of repeats.
Section 3: Experimental Setups
There are 3 different office experiment setups for our data collection. The detailed setups are shown in the paper. For the objects, we involve 14 types of objects made of 4 different materials.
Environments:
3 different environments are involved, including 3 office environments with the size of 15 ft × 13 ft, 16 ft × 12 ft, 28 ft × 23 ft, respectively.
For each room environment, data is collected on different days and with different furniture settings (i.e., 2 desks and 2 chairs are moved at least 3 ft. )
Representative objects:
Data is collected using 14 representative objects of 4 different materials including fiber: book, magazine, newspaper; metal: thermal cup, laptop; cotton/polyester: cotton T-shirts (×2), cotton T-shirts (×4), hoodie, polyester T-shirts, polyester pants; water: 1L bottle with 1L water, 1L bottle with 500ml water, 500ml bottle with 500ml water.
Section 4: Data Description
For our data organization, we separate the data files into different folders based on different days and different environments. Under these folders, data are further distributed in terms of different objects and repeat times. All the files are .mat files, which can be directly read for further applications.
Features of CSI amplitude: We calculate 7 different types of statistical features, including mean, variance, median, skewness, kurtosis, interquartile range and range, and polarization feature from CSI amplitude. Particularly, we calculate the features for all 56 subcarriers with different operating frequencies and responses to the target object.
Features of CSI phase: For the features of CSI phase, the same features with CSI amplitude are extracted and stored in the dataset.
Section 6: Citations
If your work is related to our work, please cite our papers as follows.
https://ieeexplore.ieee.org/document/9637801
Shi, Cong, Tianming Zhao, Yucheng Xie, Tianfang Zhang, Yan Wang, Xiaonan Guo, and Yingying Chen. "Environment-independent in-baggage object identification using wifi signals." In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp. 71-79. IEEE, 2021.
Facebook
TwitterIQR interquartile range (25th percentile–75th percentile).
Facebook
TwitterIn brief, the genotype at MS associated loci was used to assign each individual to the categories of risk identified using the simulated population from table 1. Furthermore, a weighted genetic risk (wGRS) was calculated by multiplying the number of risk alleles by the weight of each SNP and then taking the sum across all associations (see methods).
Facebook
Twittern = number of observations for each task, see Methods.†As measured by MET values.Abbreviations: IQR, interquartile range.
Facebook
Twitter*Baseline is the beginning of the poor adherence period which was defined as 2 consecutive self-reports of missed doses at least 1 per week over at least a 12 week period.√See Methods section for definition.¶Patients suppressed on the current regimen at baseline and throughout the study period.IQR = interquartile range, SD = standard deviation, BMI = body max index, IDU = injecting drug use.3TC = lamivudine, ABC = abacavir, AZT = zidovudine, d4T = stavudine, DDI = didanosine, FTC = emtricitabine, TDF = tenofovir.
Facebook
TwitterObjectiveThe aim of this study is to determine the residual C-peptide level and to explore the clinical significance of preserved C-peptide secretion in glycemic control in Chinese individuals with type 1 diabetes (T1D).Research design and methodsA total of 534 participants with T1D were enrolled and divided into two groups, low–C-peptide group (fasting C-peptide ≤10 pmol/L) and preserved–C-peptide group (fasting C-peptide >10 pmol/L), and clinical factors were compared between the two groups. In 174 participants who were followed, factors associated with C-peptide loss were also identified by Cox regression. In addition, glucose metrics derived from intermittently scanned continuous glucose monitoring were compared between individuals with low C-peptide and those with preserved C-peptide in 178 participants.ResultsThe lack of preserved C-peptide was associated with longer diabetes duration, glutamic acid decarboxylase autoantibody, and higher daily insulin doses, after adjustment {OR, 1.10 [interquartile range (IQR), 1.06–1.14]; OR, 0.46 (IQR, 0.27–0.77); OR, 1.04 (IQR, 1.02–1.06)}. In the longitudinal analysis, the percentages of individuals with preserved C-peptide were 71.4%, 56.8%, 71.7%, 62.5%, and 22.2% over 5 years of follow-up. Preserved C-peptide was also associated with higher time in range after adjustment of diabetes duration [62.4 (IQR, 47.3–76.6) vs. 50.3 (IQR, 36.2–63.0) %, adjusted P = 0.003].ConclusionsOur results indicate that a high proportion of Chinese patients with T1D had preserved C-peptide secretion. Meanwhile, residual C-peptide was associated with favorable glycemic control, suggesting the importance of research on adjunctive therapy to maintain β-cell function in T1D.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Study objectiveTo investigate the performance of a rapid RT-PCR assay to detect influenza A/B at emergency department admission.MethodsThis single-center prospective study recruited adult patients attending the emergency department for influenza-like illness. Triage nurses performed nasopharyngeal swab samples and ran rapid RT-PCR assays using a dedicated device (cobas Liat, Roche Diagnostics, Meylan, France) located at triage. The same swab sample was also analyzed in the department of virology using conventional RT-PCR techniques. Patients were included 24 hours-a-day, 7 days-a-week. The primary outcome was the diagnostic accuracy of the rapid RT-PCR assay performed at triage.ResultsA total of 187 patients were included over 11 days in January 2018. Median age was 70 years (interquartile range 44 to 84) and 95 (51%) were male. Nine (5%) assays had to be repeated due to failure of the first assay. The sensitivity of the rapid RT-PCR assay performed at triage was 0.98 (95% confidence interval (CI): 0.91–1.00) and the specificity was 0.99 (95% CI: 0.94–1.00). A total of 92 (49%) assays were performed at night-time or during the weekend. The median time from patient entry to rapid RT-PCR assay results was 46 [interquartile range 36–55] minutes.ConclusionRapid RT-PCR assay performed by nurses at triage to detect influenza A/B is feasible and highly accurate.
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).