Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameters are shown separately for the three different datasets (1, 2, pathological gamblers [PG]).
Facebook
TwitterWe include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
Facebook
TwitterBy US Open Data Portal, data.gov [source]
This U.S. Household Pandemic Impacts dataset assesses the mental health care that households in America have been receiving over the past four weeks during the Covid-19 pandemic. Produced by a collaboration between the U.S. Census Bureau, and five other federal agencies, this survey was designed to measure both social and economic impacts of Covid-19 on American households, such as employment status, consumer spending trends, food security levels and housing disruptions among other important factors. The data collected was based on an internet questionnaire which was conducted through emails and text messages sent to randomly selected housing units from across America linked with email addresses or cell phone numbers from the Census Bureau Master Address File Data; all estimates comply with NCHS Data Presentation Standards for Proportions. Be sure to check out more about how U.S Government Works for further details!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can be useful to examine the impact of the Covid-19 pandemic on access to and utilization of mental health care by U.S. households in the last 4 weeks.
By studying this dataset, you can gain insight into how people’s mental health has been affected by the pandemic and identify trends based on population subgroups, states, phases of the survey and more.
Instructions for Use: - To get started, open up ‘csv-1’ found in this dataset. This file contains information on access to and utilization of mental health care by U.S households in the last 4 weeks, broken down into 14 different columns (e.g., Indicator, Group, State).
- Familiarize yourself with each column label (e.g., Time Period Start Date), data type (e
- Analyzing the impact of pandemic-induced stress on different demographic groups, such as age and race/ethnicity.
- Comparing the mental health care services received in different states over time.
- Investigating the correlation between socio-economic status and access to mental health care services during Covid-19 pandemic
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: csv-1.csv | Column name | Description | |:---------------------------|:-------------------------------------------------------------------| | Indicator | The type of indicator being measured. (String) | | Group | The group (by age, gender or race) being measured. (String) | | State | The state where the data was collected. (String) | | Subgroup | A narrower level categorization within Group. (String) | | Phase | Phase number reflective of survey iteration. (Integer) | | Time Period | A label indicating duration captured by survey period. (String) | | Time Period Label | A label indicating duration captured by survey period. (String) | | Time Period Start Date | Beginning date for surveyed period. (DateFormat ‘YYYY-MM-DD’) | | Time Period End Date | End date for surveyed period. (DateFormat ‘YYYY-MM-DD’) | | Value | The value of the indicator being measured. (Float) | | LowCI | The lower confidence interval of the value. (Float) | | HighCI | The higher confidence interval of the value. (Float) | | Quartile Range | The quartile range of the value. (String) | | Suppression Flag | A f...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Median (interquartile range; IQR) demographic and clinical data of participants.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset includes one dataset which was custom ordered from Statistics Canada.The table includes information on housing suitability and shelter-cost-to-income ratio by number of bedrooms, housing tenure, status of primary household maintainer, household type, and income quartile ranges for census subdivisions in British Columbia. The dataset is in Beyond 20/20 (.ivt) format. The Beyond 20/20 browser is required in order to open it. This software can be freely downloaded from the Statistics Canada website: https://www.statcan.gc.ca/eng/public/beyond20-20 (Windows only). For information on how to use Beyond 20/20, please see: http://odesi2.scholarsportal.info/documentation/Beyond2020/beyond20-quickstart.pdf https://wiki.ubc.ca/Library:Beyond_20/20_Guide Custom order from Statistics Canada includes the following dimensions and variables: Geography: Non-reserve CSDs in British Columbia - 299 geographies The global non-response rate (GNR) is an important measure of census data quality. It combines total non-response (households) and partial non-response (questions). A lower GNR indicates a lower risk of non-response bias and, as a result, a lower risk of inaccuracy. The counts and estimates for geographic areas with a GNR equal to or greater than 50% are not published in the standard products. The counts and estimates for these areas have a high risk of non-response bias, and in most cases, should not be released. All the geographies requested for this tabulation have been cleared for the release of income data and have a GNR under 50%. Housing Tenure Including Presence of Mortgage (5) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero by housing tenure 2. Households who own 3. With a mortgage1 4. Without a mortgage 5. Households who rent Note: 1) Presence of mortgage - Refers to whether the owner households reported mortgage or loan payments for their dwelling. 2015 Before-tax Household Income Quartile Ranges (5) 1. Total – Private households by quartile ranges1, 2, 3 2. Count of households under or at quartile 1 3. Count of households between quartile 1 and quartile 2 (median) (including at quartile 2) 4. Count of households between quartile 2 (median) and quartile 3 (including at quartile 3) 5. Count of households over quartile 3 Notes: 1) A private household will be assigned to a quartile range depending on its CSD-level location and depending on its tenure (owned and rented). Quartile ranges for owned households in a specific CSD are delimited by the 2015 before-tax income quartiles of owned households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. Quartile ranges for rented households in a specific CSD are delimited by the 2015 before-tax income quartiles of rented households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. 2) For the income quartiles dollar values (the delimiters) please refer to Table 1. 3) Quartiles 1 to 3 are suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 16. For cases in which the renters’ quartiles or the owners’ quartiles (figures from Table 1) of a CSD are suppressed the CSD is assigned to a quartile range depending on the provincial renters’ or owners’ quartile figures. Number of Bedrooms (Unit Size) (6) 1. Total – Private households by number of bedrooms1 2. 0 bedrooms (Bachelor/Studio) 3. 1 bedroom 4. 2 bedrooms 5. 3 bedrooms 6. 4 bedrooms Note: 1) Dwellings with 5 bedrooms or more included in the total count only. Housing Suitability (6) 1. Total - Housing suitability 2. Suitable 3. Not suitable 4. One bedroom shortfall 5. Two bedroom shortfall 6. Three or more bedroom shortfall Note: 1) 'Housing suitability' refers to whether a private household is living in suitable accommodations according to the National Occupancy Standard (NOS); that is, whether the dwelling has enough bedrooms for the size and composition of the household. A household is deemed to be living in suitable accommodations if its dwelling has enough bedrooms, as calculated using the NOS. 'Housing suitability' assesses the required number of bedrooms for a household based on the age, sex, and relationships among household members. An alternative variable, 'persons per room,' considers all rooms in a private dwelling and the number of household members. Housing suitability and the National Occupancy Standard (NOS) on which it is based were developed by Canada Mortgage and Housing Corporation (CMHC) through consultations with provincial housing agencies. Shelter-cost-to-income-ratio (4) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero 2. Spending less than 30% of households total income on shelter costs 3. Spending 30% or more of households total income on shelter costs 4. Spending 50% or more of households total income on shelter costs Note: 'Shelter-cost-to-income...
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterSupplementary figure 1Rank abundance distributions for habitats at three taxonomic levelsSuppl_fig_1.pdfSupplementary figure 2Evenness and species richness of the four habitats at three taxonomic levels.Suppl_fig_2.pdfSupplementary figure 3Distribution of p-values from Mantel test for Spearman correlation between dissimilarity matrices representing different taxonomic and numerical levels. A-C, Correlation between taxonomic levels at different numerical resolutions. D-F, Correlation between proportional abundance data and higher levels of numerical transformation. Filled points represent median p-values across 1000 subsampling iterations, empty points are outliers that lie beyond 1.5 times the interquartile range from the upper quartile.Suppl_fig_3.pdfSupplementary figure 4NMDS ordination of a double-standardized subsample of the total dataset comparing individual habitats along the depth- and salinity gradient for species and families using proportional abundances and presence/absence ...
Facebook
TwitterThe dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Hydrological Response Variables (HRVs) are the hydrological characteristics of the system that potentially change due to coal resource development. These data refer to the HRVs related to the AWRA-R model for the Namoi subregion for the 54 simulation nodes. The nine hydrological response variables (AF, P99, FD, IQR, ZFD, P01, LFD, LFS, LLFS) were computed under CRDP and Baseline conditions, respectively and the ACRD is the difference between the Baseline and CRDP.
Abbreviation meaning
AF - the annual streamflow volume (GL/year)
P01 - the daily streamflow rate at the first percentile (ML/day)
P01 - the daily streamflow rate at the first percentile (ML/day)
IQR - the inter-quartile range in daily streamflow (ML/day). That is, the difference between the daily streamflow rate at the 75th percentile and at the 25th percentile.
LFD - the number of low streamflow days per year. The threshold for low streamflow days is the 10th percentile from the simulated 90-year period (2013 to 2102)
LFS - the number of low streamflow spells per year (perennial streams only). A spell is defined as a period of contiguous days of streamflow below the 10th percentile threshold
LLFS - the length (days) of the longest low streamflow spell each year
P99 - the daily streamflow rate at the 99th percentile (ML/day)
FD - flood days, the number of days with streamflow greater than the 90th percentile from the simulated 90-year period (2013 to 2102)
ZFD - Zero flow days
This is the dataset used for the Namoi 2.6.1 product to evaluate additional coal mine and coal resource development impacts on hydrological response variables at 54 simulation nodes.
The Namoi AWRA-R model outputs were used to determine the impacts on the HRVs to produce these data. Readme files within the folders in the dataset provide an explanation on how the resource was created. The nine HRVs (AF, P99, FD, IQR, ZFD, P01, LFD, LFS, LLFS) were computed under CRDP and Baseline conditions, respectively. The difference between CRDP and Baseline is used for predicting ACRD impacts on hydrological response variables at 54 simulation nodes.
Bioregional Assessment Programme (2017) Namoi standard Hydrological Response Variables (HRVs). Bioregional Assessment Derived Dataset. Viewed 11 December 2018, http://data.bioregionalassessments.gov.au/dataset/189f4c7a-29e1-41f9-868d-b7f5184d829f.
Derived From Historical Mining Footprints DTIRIS NAM 20150914
Derived From Namoi AWRA-R (restricted input data implementation)
Derived From River Styles Spatial Layer for New South Wales
Derived From Namoi Surface Water Mine Footprints - digitised
Derived From Namoi AWRA-R model implementation (post groundwater input)
Derived From National Surface Water sites Hydstra
Derived From Namoi AWRA-L model
Derived From Namoi Hydstra surface water time series v1 extracted 140814
Derived From GEODATA 9 second DEM and D8: Digital Elevation Model Version 3 and Flow Direction Grid 2008
Derived From Namoi Environmental Impact Statements - Mine footprints
Derived From Namoi Existing Mine Development Surface Water Footprints
Facebook
TwitterThe dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.
The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer
CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.
CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**
Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.
Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.
If you find the data helpful or you use the data for your research, please cite our work:
Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).
The data record is structured into 4 comma-separated value (CSV) files, as follows:
id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:
COD_PROV is an integer field that is used to identify a province in all other data records;
SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);
DEN_PCM is the full name of the province.
OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:
p1: COD_PROV of origin,
p2: COD_PROV of destination,
day: in the format yyyy-mm-dd.
median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.
week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,
week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,
average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
day in the format yyyy-mm-dd.
ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GPR data were acquired with a MALA HDR GPR system with a 450 MHz shielded antenna. Data were acquired every 4 cm, tracked by a survey wheel for precise relative positioning. The GPR survey covered approximately 19 km of ridgeline and included 21 short (approximately 10 m) ridgetop profiles with cone penetrometer observations that serve as ground-truth for the depth of the soil-saprolite boundary inferred from the GPR data. The GPR data were processed using the open-source GPRPy software (Plattner, 2020). We constrained sub-surface velocities by fitting 364 diffraction hyperbolas in the GPR transects. The hyperbola fitting supports a spatially-uniform velocity of approximately 0.11 m/ns. The locations and fitted velocities of the individual hyperbolas used to construct this velocity model are included in the file “GPR_velocities.csv.”
The folder “Radar450MHz_raw” includes the raw radar data. The folder “Radar450MHz_GPS” includes the GPS data associated with each radar dataset (saved as .cor files). The GPS observations are aggregated in the spreadsheet “GPS_all” in that folder. The shapefile folder includes the final processed GPR-derived soil thickness estimates (soil thickness reported in units of meters) as a .shp file. The coordinate system for the shapefile is WGS84 / UTM Zone 11 N.
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.
Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.
Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.
This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.
This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.
This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.
The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.
These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.
For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.
This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.
This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:
This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.
📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The features possibly affecting ground-based PM2.5 from 2014 to 2023 in China were collected to make up our first version of the ML-CNPM2.5. Thanks to our filling and calibrating methods, over 5 million samples (5,076,608) have been obtained, which is so more PM2.5 samples that have not been covered in previous studies, to our knowledge. To train and assess different models in terms of primary and higher accuracy ML-based models, the dataset including unfilled AOD, with 1790210-line records, is also issued since filled AOD always shows lower accuracy than unfilled. To distinguish the two datasets, the filled AOD dataset is named ML-CNPM2.5-A and the unfiled is named ML-CNPM2.5-B. There are twenty-four features contained in the ML-CNPM2.5 A, whereas twenty-three features in the ML-CNPM2.5-B. Most of the features directly affect or indirectly affect ground-based PM2.5 estimating using remote sensing and ML technology, thereby being widely used as the input of ML-based models. The distribution of each feature in the ML-CNPM2.5-A (ML-CNPM2.5-B) is revealed in Fig. 1 (Fig. 2). The Figures intuitively demonstrate each feature’s range of values, including median, quartile, and outlier. For example, the distribution of Terra MAIAC AOD is changed plainly after being calibrated, i.e., from the range of 0-8 calibrated to the range of 0-3, which is more realistic. The discrete features, including year, month, day, Doy and LUC, show even distribution in their range of values, indicating the equilibrium and comprehensiveness of our sample dataset. Detailed information about these features is listed in Table 2 (Table S1) for ML-CNPM2.5-A (CNPM2.5-B). Overall, our sample dataset includes commend features used widely in estimating PM2.5, with high-volume and comprehensive records, as big data ensures the training and validation of different models.
Facebook
TwitterThis instructional activity introduces students to the application of statistical tools for analyzing biological data, with a focus on measures of center (mean, median, mode) and measures of spread (range, quartiles, standard deviation). Using real-world biological contexts. students learn how to summarize datasets, identify trends, and evaluate variability. The activity integrates the use of MS Excel and TI-84 Plus graphing calculators to calculate descriptive statistics and interpret results. By engaging with authentic biological data, students develop quantitative reasoning skills that enhance their ability to detect patterns, recognize variability, and draw meaningful conclusions about biological systems
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundTuberculosis (TB) remains a significant public health challenge, particularly among vulnerable populations like children. This is especially true in Sub-Saharan Africa, where the burden of TB in children is substantial. Zambia ranks 21st among the top 30 high TB endemic countries globally. While studies have explored TB in adults in Zambia, the prevalence and associated factors in children are not well documented. This study aimed to determine the prevalence and sociodemographic, and clinical factors associated with active TB disease in hospitalized children under the age of 15 years at Livingstone University Teaching Hospital (LUTH), the largest referral center in Zambia’s Southern Province.MethodsThis retrospective cross-sectional study of 700 pediatric patients under 15 years old, utilized programmatic data from the Pediatrics Department at LUTH. A systematic sampling method was used to select participants from medical records. Data on demographics, medical conditions, anthropometric measurements, and blood tests were collected. Data analysis included descriptive statistics, chi-square tests, and multivariable logistic regression to identify factors associated with TB.ResultsThe median age was 24 months (interquartile range (IQR): 11, 60) and majority were male (56.7%, n = 397/700). Most participants were from urban areas (59.9%, n = 419/700), and 9.2% (n = 62/675) were living with HIV. Malnutrition and comorbidities were present in a significant portion of the participants (19.0% and 25.1%, respectively). The prevalence of active TB cases was 9.4% (n = 66/700) among hospitalized children. Persons living with HIV (Adjusted odds ratio (AOR) of 6.30; 95% confidence interval (CI) of 2.85, 13.89, p< 0.001), and those who were malnourished (AOR: 10.38, 95% CI: 4.78, 22.55, p< 0.001) had a significantly higher likelihood of developing active TB disease.ConclusionThis study revealed a prevalence 9.4% active TB among hospitalized children under 15 years at LUTH. HIV status and malnutrition emerged as significant factors associated with active TB disease. These findings emphasize the need for pediatric TB control strategies that prioritize addressing associated factors to effectively reduce the burden of tuberculosis in Zambian children.
Facebook
TwitterIntroductionThe aim of this study was to determine patterns of physical activity in pet dogs using real-world data at a population scale aided by the use of accelerometers and electronic health records (EHRs).MethodsA directed acyclic graph (DAG) was created to capture background knowledge and causal assumptions related to dog activity, and this was used to identify relevant data sources, which included activity data from commercially available accelerometers, and health and patient metadata from the EHRs. Linear mixed models (LMM) were fitted to the number of active minutes following log-transformation with the fixed effects tested based on the variables of interest and the adjustment sets indicated by the DAG.ResultsActivity was recorded on 8,726,606 days for 28,562 dogs with 136,876 associated EHRs, with the median number of activity records per dog being 162 [interquartile range (IQR) 60–390]. The average recorded activity per day of 51 min was much lower than previous estimates of physical activity, and there was wide variation in activity levels from less than 10 to over 600 min per day. Physical activity decreased with age, an effect that was dependent on breed size, whereby there was a greater decline in activity for age as breed size increased. Activity increased with breed size and owner age independently. Activity also varied independently with sex, location, climate, season and day of the week: males were more active than females, and dogs were more active in rural areas, in hot dry or marine climates, in spring, and on weekends.ConclusionAccelerometer-derived activity data gathered from pet dogs living in North America was used to determine associations with both dog and environmental characteristics. Knowledge of these associations could be used to inform daily exercise and caloric requirements for dogs, and how they should be adapted according to individual circumstances.
Facebook
TwitterMedian and inter-quartile range or absolute number and proportion. See S1 Table for more details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameters are shown separately for the three different datasets (1, 2, pathological gamblers [PG]).