Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.
Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.
Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.
This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.
This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.
This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.
The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.
These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.
For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Administrative disposable income is a third pillar of the income statistics that Statbel publishes, alongside "\2" and poverty indicators based on "\2", and allows answering other types of questions than SILC and tax statistics.
SILC uses "\2" at the household level as a concept of income, cumulating the incomes of all household members. In the next step, this disposable income is converted into equivalised disposable income to take into account the composition of the household. Based on the SILC, at-risk-of-poverty figures are published up to the provincial level. However, the sample size does not allow for analyses at a more detailed geographical level. However, statistics based on tax revenues are available up to the level of the statistical sector, but are limited to taxable income in the context of personal income tax returns. Non-taxable income is not taken into account and there is also no correction according to the composition of the household.
The variable "administrative equivalised disposable income" responds to a growing demand for income and poverty figures at the communal level. It uses an income concept based on administrative sources that tries to correspond as much as possible to that of SILC. For the population as a whole, both taxable and non-taxable income are taken into account. They are added together for all members of the household in order to obtain an administrative disposable income for the household. After adjusting for the composition of the household, the variable "administrative equivalised disposable income" is established. This can be used to calculate income and poverty figures at the communal level.
Indicators are not disseminated for an entity and a category when there are at least 15% of people whose equivalent administrative disposable income is missing or when there are less than 100 people with a valid income.
More information on the page "\2" of Statbel
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study investigates the phenomena of semantic drift through the lenses of language and situated simulation (LASS) and the word frequency effect (WFE) within a timed word association task. Our primary objectives were to determine whether semantic drift can be identified over the short time (25 seconds) of a free word association task (a predicted corollary of LASS), and whether more frequent terms are generated earlier in the process (as expected due to the WFE). Respondents were provided with five cue words (tree, dog, quality, plastic and love), and asked to write as many associations as they could. We hypothesized that terms generated later in the task (fourth time quartile, the last 19–25 seconds) would be semantically more distant (cosine similarity) from the cue word than those generated earlier (first quartile, the first 1–7 seconds), indicating semantic drift. Additionally, we explored the WFE by hypothesizing that earlier generated words would be more frequent and less diverse. Utilizing a dataset matched with GloVe 300B word embeddings, BERT and WordNet synsets, we analysed semantic distances among 1569 unique term pairs for all cue words across time. Our results supported the presence of semantic drift, with significant evidence of within-participant, semantic drift from the first to fourth time (LASS) and frequency (WFE) quartiles. In terms of the WFE, we observed a notable decrease in the diversity of terms generated earlier in the task, while more unique terms (greater diversity and relative uniqueness) were generated in the 4th time quartile, aligning with our hypothesis that more frequently used words dominate early stages of a word association task. We also found that the size of effects varied substantially across cues, suggesting that some cues might invoke stronger and more idiosyncratic situated simulations. Theoretically, our study contributes to the understanding of LASS and the WFE. It suggests that semantic drift might serve as a scalable indicator of the invocation of language versus simulation systems in LASS and might also be used to explore cognition within word association tasks more generally. The findings also add a temporal and relational dimension to the WFE. Practically, our research highlights the utility of word association tasks in understanding semantic drift and the diffusion of word usage over a sub-minute task, arguably the shortest practically feasible timeframe, offering a scalable method to explore group and individual changes in semantic relationships, whether via the targeted diffusion of influence in a marketing campaign, or seeking to understand differences in cognition more generally. Possible practical uses and opportunities for future research are discussed.
Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
BackgroundNon-alcoholic fatty liver disease (NAFLD) is independently associated with atrial fibrillation (AF) risk. The uric acid (UA) to high-density lipoprotein cholesterol (HDL-C) ratio (UHR) has been shown to be closely associated with cardiovascular disease (CVD) and NAFLD. The aim of this study is to clarify whether elevated UHR is associated with the occurrence of AF in patients with NAFLD and to determine whether UHR predicted AF.MethodsPatients diagnosed with NAFLD in the Department of Cardiovascular Medicine of the Second Hospital of Shanxi Medical University from January 1, 2020, to December 31, 2021, were retrospectively enrolled in this study. The study subjects were categorized into AF group and non-AF group based on the presence or absence of combined AF. Logistic regression was performed to evaluate the correlation between UHR and AF. Sensitivity analysis and subgroup interaction analysis were performed to verify the robustness of the study results. Receiver operating characteristic (ROC) curve analysis was used to determine the optimal cutoff value for UHR to predict the development of AF in patients with NAFLD.ResultsA total of 421 patients with NAFLD were included, including 171 in the AF group and 250 in the non-AF group. In the univariate regression analysis, NAFLD patients with higher UHR were more likely to experience AF, and the risk of AF persisted after confounding factors were adjusted for (OR: 1.010, 95%CI: 1.007–1.013, P<0.001). AF risk increased with increasing UHR quartile (P for trend < 0.001). Despite normal serum UA and HDL-C, UHR was still connected with AF in patients with NAFLD. All subgroup variables did not interact significantly with UHR in the subgroup analysis. The ROC curve analysis showed that the areas under the curve for UA, HDL-C, and UHR were 0.702, 0.606, and 0.720, respectively, suggesting that UHR has a higher predictive value for AF occurrence in NAFLD patients compared to HDL-C or UA alone.ConclusionIncreased UHR level was independently correlated with a high risk of AF in NAFLD patients.
Fig1-needleleaf forest.txt contains all the observation data with each reference given for figure 1. The deposition velocity vd and diameter dp are shown in ordered arrays. vd_err and dp_err define the deposition velocity and diameter error bars. Fig 2-needleleaf.txt contains same observation data as Fig1-needleleaf forest.txt Fig3-Broadleaf forest.txt contains all the observation data with each reference given for broadleaf forests in Fig 3. Data format same as Fig1 Fig4-Grasst.txt contains all the observation data with each reference given for grass in Fig 4. Data format same as Fig1 Fig5.txt contains data from Zhang et al. 2014 for three different U* values Fig6-Watert.txt contains all the observation data with each reference given for water in Fig 6. Data format same as Fig1 DataFig7,TXT is a tab-deliminated text file containing the data in tabular for for Figure 7 DataFig8,TXT is a tab-deliminated text file containing the data in tabular for for Figure 8 Fig14a-133_P6p3_add_newadd_PM25_TOT_126719_boxplot_hourly_data.csv is a CSV file containing data for the hourly average median and 1st and 3rd quartiles of observation and two 1.33 km model runs that are represented by boxes in figure 14a. Fig14b-12US1_P6p3_add_PM25_TOT_211556_boxplot_hourly_data.csvis a CSV file containing data for the hourly average median and 1st and 3rd quartiles of observation and two 12 km model runs that are represented by boxes in Figure 14b. Fig15-133_P6p3_add_newadd_PM25_TOT_728997_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 1.33 km model runs and the differences in bias and error between the models at AQS sites Fig16-12US1_P6p3_add_PM25_TOT_971641_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 12 km model runs and the differences in bias and error between the models at AQS sites Fig17-12US1_P6p3_add_PM25_TOT_104554_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 12 km model runs and the differences in bias and error between the models at IMPROVE sites. Portions of this dataset are inaccessible because: Figs 9-13 are all plots directly from CMAQ output files which are far too large. They can be accessed through the following means: Can contact primary author, Jon Pleim, to access the data. Format: CMAQ netcdf output files
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Welcome to the direct oral anticoagulant (DOAC) Reanalysis Dataset.
Sheet 1: Exact references to the FDA reviews from which we extracted all data points. You will also find links to the FDA drug approval packages, where one also finds all other published documents pertaining to the approvals, such as statistical reviews. In Sheet 1, we also cite the primary trial reports for each of the four pivotal DOAC trials.
Sheet 2: Basic overview of the 4 pivotal DOAC trials with an emphasis on time in therapeutic range (TTR) characteristics.
ISheet 3: Summary results from each of the 4 DOAC trials for the outcomes of stroke/systemic embolism, major bleed, and mortality (including outcome definitions from each trial).
Sheet 4: The full TTR dataset with outcomes stratified into quartiles (Q1 to Q4), including exact references to each data point in the FDA reviews.
Sheet 5: Q4 thresholds and conclusions in the industry TTR analyses.
This dataset contains gender pay gap figures for all employees in London and large employers in London. The pay gap figures for GLA group organisations can be found on their respective websites. The gender pay gap is the difference in the average hourly wage of all men and women across a workforce. If women do more of the less well paid jobs within an organisation than men, the gender pay gap is usually bigger. The UK government publish gender pay gap figures for all employers with 250 or more employees. A cut of this dataset that only shows employers that are registered in London can be found below. Read a report by the Local Government Association (LGA) that summarises the mean and median pay gaps in local authorities, as well as the distribution of staff across pay quartiles. This dataset is one of the Greater London Authority's measures of Economic Fairness. Click here to find out more. This dataset is one of the Greater London Authority's measures of Economic Development strategy. Click here to find out more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Conferences are an essential tool for scientific communication. In disciplines such as Computer Science, over 50% of original research results are published in conference proceedings. In this dataset, there is is a list of conference proceedings, categorized Q1 - Q4 by analogy with SJR journal quartiles. We have analyzed the role of conference proceedings in various disciplines and propose an alternative approach to research evaluation based on conference proceedings and Scimago Journal Rank (SJR). Comparison of the resulting list in Computer Science with the CORE ranking showed a 62% match, as well as an average rank correlation of the distribution by category.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
This Atlas presents more than 80,000 plots of the empirical frequency distributions of temperature and salinity for each 5-degree square area of the North Atlantic Ocean (80N to 30S) at all standard depth levels based on World Ocean Database 1998 data. Additional empirical statistical plots include the mean and standard deviation based on the arithmetic mean, the median and Median Absolute Deviation (MAD), winsorized estimates of the mean and standard deviation, quartiles, and skewness estimated from the quartiles. Some of these statistics are presented in both "normalized" and "natural" coordinates. Disc 1 contains seasonal distributions for the upper (0 m to 400 m) ocean. Disc 2 contains annual distributions for the deep (500 m - 5500 m) ocean.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset captures historical financial market data and macroeconomic indicators spanning over three decades, from 1990 onwards. It is designed for financial analysis, time series forecasting, and exploring relationships between market volatility, stock indices, and macroeconomic factors. This dataset is particularly relevant for researchers, data scientists, and enthusiasts interested in studying: - Volatility forecasting (VIX) - Stock market trends (S&P 500, DJIA, HSI) - Macroeconomic influences on markets (joblessness, interest rates, etc.) - The effect of geopolitical and economic uncertainty (EPU, GPRD)
The data has been aggregated from a mix of historical financial records and publicly available macroeconomic datasets: - VIX (Volatility Index): Chicago Board Options Exchange (CBOE). - Stock Indices (S&P 500, DJIA, HSI): Yahoo Finance and historical financial databases. - Volume Data: Extracted from official exchange reports. - Macroeconomic Indicators: Bureau of Economic Analysis (BEA), Federal Reserve, and other public records. - Uncertainty Metrics (EPU, GPRD): Economic Policy Uncertainty Index and Global Policy Uncertainty Database.
dt
: Date of observation in YYYY-MM-DD format.vix
: VIX (Volatility Index), a measure of expected market volatility.sp500
: S&P 500 index value, a benchmark of the U.S. stock market.sp500_volume
: Daily trading volume for the S&P 500.djia
: Dow Jones Industrial Average (DJIA), another key U.S. market index.djia_volume
: Daily trading volume for the DJIA.hsi
: Hang Seng Index, representing the Hong Kong stock market.ads
: Aruoba-Diebold-Scotti (ADS) Business Conditions Index, reflecting U.S. economic activity.us3m
: U.S. Treasury 3-month bond yield, a short-term interest rate proxy.joblessness
: U.S. unemployment rate, reported as quartiles (1 represents lowest quartile and so on).epu
: Economic Policy Uncertainty Index, quantifying policy-related economic uncertainty.GPRD
: Geopolitical Risk Index (Daily), measuring geopolitical risk levels.prev_day
: Previous day’s S&P 500 closing value, added for lag-based time series analysis.Feel free to use this dataset for academic, research, or personal projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics (minimum, first quartile, median, mean, third quartile, maximum) of probabilities for the RPDLomax and Logistic models by class (Wilt dataset).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data collected for all 100 blocks assessed includes total number of units, number of units with more than 5 plants, number of units with more than 5 containers, corridor and public cleanliness rating, number of times out of the 10 public spots assessed that gully traps, open and covered drains and plants were present, median house price, year built and abundance status. (XLSX)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Feature preparation Preprocessing was applied to the data, such as creating dummy variables and performing transformations (centering, scaling, YeoJohnson) using the preProcess() function from the “caret” package in R. The correlation among the variables was examined and no serious multicollinearity problems were found. A stepwise variable selection was performed using a logistic regression model. The final set of variables included: Demographic: age, body mass index, sex, ethnicity, smoking History of disease: heart disease, migraine, insomnia, gastrointestinal disease, COVID-19 history: covid vaccination, rashes, conjunctivitis, shortness of breath, chest pain, cough, runny nose, dysgeusia, muscle and joint pain, fatigue, fever ,COVID-19 reinfection, and ICU admission. These variables were used to train and test various machine-learning models Model selection and training The data was randomly split into 80% training and 20% testing subsets. The “h2o” package in R version 4.3.1 was employed to implement different algorithms. AutoML was first used, which automatically explored a range of models with different configurations. Gradient Boosting Machines (GBM), Random Forest (RF), and Regularized Generalized Linear Model (GLM) were identified as the best-performing models on our data and their parameters were fine-tuned. An ensemble method that stacked different models together was also used, as it could sometimes improve the accuracy. The models were evaluated using the area under the curve (AUC) and C-statistics as diagnostic measures. The model with the highest AUC was selected for further analysis using the confusion matrix, accuracy, sensitivity, specificity, and F1 and F2 scores. The optimal prediction threshold was determined by plotting the sensitivity, specificity, and accuracy and choosing the point of intersection as it balanced the trade-off between the three metrics. The model’s predictions were also plotted, and the quantile ranges were used to classify the model’s prediction as follows: > 1st quantile, > 2nd quantile, > 3rd quartile and < 3rd quartile (very low, low, moderate, high) respectively. Metric Formula C-statistics (TPR + TNR - 1) / 2 Sensitivity/Recall TP / (TP + FN) Specificity TN / (TN + FP) Accuracy (TP + TN) / (TP + TN + FP + FN) F1 score 2 * (precision * recall) / (precision + recall) Model interpretation We used the variable importance plot, which is a measure of how much each variable contributes to the prediction power of a machine learning model. In H2O package, variable importance for GBM and RF is calculated by measuring the decrease in the model's error when a variable is split on. The more a variable's split decreases the error, the more important that variable is considered to be. The error is calculated using the following formula: 𝑆𝐸=𝑀𝑆𝐸∗𝑁=𝑉𝐴𝑅∗𝑁 and then it is scaled between 0 and 1 and plotted. Also, we used The SHAP summary plot which is a graphical tool to visualize the impact of input features on the prediction of a machine learning model. SHAP stands for SHapley Additive exPlanations, a method to calculate the contribution of each feature to the prediction by averaging over all possible subsets of features [28]. SHAP summary plot shows the distribution of the SHAP values for each feature across the data instances. We use the h2o.shap_summary_plot() function in R to generate the SHAP summary plot for our GBM model. We pass the model object and the test data as arguments, and optionally specify the columns (features) we want to include in the plot. The plot shows the SHAP values for each feature on the x-axis, and the features on the y-axis. The color indicates whether the feature value is low (blue) or high (red). The plot also shows the distribution of the feature values as a density plot on the right.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
By statewide vaccine equity metric (VEM) quartiles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quartiles of the cell-specific probability of tuberculosis transmission.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.