23 datasets found

D
Report Card Administrators by School Poverty Quartile School Years 2017-18...
data.wa.gov
catalog.data.gov
application/rdfxml +5
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OSPI (2025). Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24 [Dataset]. https://data.wa.gov/resource/fhnj-yqpr
Explore at:
csv, json, application/rssxml, xml, tsv, application/rdfxmlAvailable download formats
Dataset updated
Jan 16, 2025
Dataset authored and provided by
OSPI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.
COVID-19 Vaccine Progress Dashboard Data by ZIP Code
data.ca.gov
data.chhs.ca.gov
+2more
csv, xlsx, zip
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data by ZIP Code [Dataset]. https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code
Explore at:
csv, xlsx, zipAvailable download formats
Dataset updated
Aug 9, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.

Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.

This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.

This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.

This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.

The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.

These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
3rd quartile of the equivalent disposable administrative income of couples...
data.europa.eu
csv, json
Updated Jul 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IWEPS (2025). 3rd quartile of the equivalent disposable administrative income of couples with at least one spouse aged 65 or over [Dataset]. https://data.europa.eu/88u/dataset/831110-50
Explore at:
json, csvAvailable download formats
Dataset updated
Jul 16, 2025
Dataset provided by
Walloon Institute for Evaluation, Prospective Studies and Statistics
Authors
IWEPS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Administrative disposable income is a third pillar of the income statistics that Statbel publishes, alongside "\2" and poverty indicators based on "\2", and allows answering other types of questions than SILC and tax statistics.

SILC uses "\2" at the household level as a concept of income, cumulating the incomes of all household members. In the next step, this disposable income is converted into equivalised disposable income to take into account the composition of the household. Based on the SILC, at-risk-of-poverty figures are published up to the provincial level. However, the sample size does not allow for analyses at a more detailed geographical level. However, statistics based on tax revenues are available up to the level of the statistical sector, but are limited to taxable income in the context of personal income tax returns. Non-taxable income is not taken into account and there is also no correction according to the composition of the household.

The variable "administrative equivalised disposable income" responds to a growing demand for income and poverty figures at the communal level. It uses an income concept based on administrative sources that tries to correspond as much as possible to that of SILC. For the population as a whole, both taxable and non-taxable income are taken into account. They are added together for all members of the household in order to obtain an administrative disposable income for the household. After adjusting for the composition of the household, the variable "administrative equivalised disposable income" is established. This can be used to calculate income and poverty figures at the communal level.

Indicators are not disseminated for an entity and a category when there are at least 15% of people whose equivalent administrative disposable income is missing or when there are less than 100 people with a valid income.

More information on the page "\2" of Statbel
f
Relative uniqueness by quartile with Z tests.
plos.figshare.com
xls
Updated Jul 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean MacNiven; Ralph Tench (2024). Relative uniqueness by quartile with Z tests. [Dataset]. http://doi.org/10.1371/journal.pone.0305568.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305568.t004
Dataset updated
Jul 1, 2024
Dataset provided by
PLOS ONE
Authors
Sean MacNiven; Ralph Tench
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study investigates the phenomena of semantic drift through the lenses of language and situated simulation (LASS) and the word frequency effect (WFE) within a timed word association task. Our primary objectives were to determine whether semantic drift can be identified over the short time (25 seconds) of a free word association task (a predicted corollary of LASS), and whether more frequent terms are generated earlier in the process (as expected due to the WFE). Respondents were provided with five cue words (tree, dog, quality, plastic and love), and asked to write as many associations as they could. We hypothesized that terms generated later in the task (fourth time quartile, the last 19–25 seconds) would be semantically more distant (cosine similarity) from the cue word than those generated earlier (first quartile, the first 1–7 seconds), indicating semantic drift. Additionally, we explored the WFE by hypothesizing that earlier generated words would be more frequent and less diverse. Utilizing a dataset matched with GloVe 300B word embeddings, BERT and WordNet synsets, we analysed semantic distances among 1569 unique term pairs for all cue words across time. Our results supported the presence of semantic drift, with significant evidence of within-participant, semantic drift from the first to fourth time (LASS) and frequency (WFE) quartiles. In terms of the WFE, we observed a notable decrease in the diversity of terms generated earlier in the task, while more unique terms (greater diversity and relative uniqueness) were generated in the 4th time quartile, aligning with our hypothesis that more frequently used words dominate early stages of a word association task. We also found that the size of effects varied substantially across cues, suggesting that some cues might invoke stronger and more idiosyncratic situated simulations. Theoretically, our study contributes to the understanding of LASS and the WFE. It suggests that semantic drift might serve as a scalable indicator of the invocation of language versus simulation systems in LASS and might also be used to explore cognition within word association tasks more generally. The findings also add a temporal and relational dimension to the WFE. Practically, our research highlights the utility of word association tasks in understanding semantic drift and the diffusion of word usage over a sub-minute task, arguably the shortest practically feasible timeframe, offering a scalable method to explore group and individual changes in semantic relationships, whether via the targeted diffusion of influence in a marketing campaign, or seeking to understand differences in cognition more generally. Possible practical uses and opportunities for future research are discussed.
Gender, Age, and Emotion Detection from Voice
kaggle.com
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/datasets/rohitzaman/gender-age-and-emotion-detection-from-voice/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rohit Zaman
Description
Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
f
Correlation between UHR quartiles and AF.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhao, Jianqi; Bai, Rui; Liu, Gaizhen; Song, Xiaosu; Zhou, Meng; Zhang, Qi; Qin, Weiwei; Zhang, Yonglai; Li, Baojie (2024). Correlation between UHR quartiles and AF. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001456462
Explore at:
Dataset updated
Jun 24, 2024
Authors
Zhao, Jianqi; Bai, Rui; Liu, Gaizhen; Song, Xiaosu; Zhou, Meng; Zhang, Qi; Qin, Weiwei; Zhang, Yonglai; Li, Baojie
Description
BackgroundNon-alcoholic fatty liver disease (NAFLD) is independently associated with atrial fibrillation (AF) risk. The uric acid (UA) to high-density lipoprotein cholesterol (HDL-C) ratio (UHR) has been shown to be closely associated with cardiovascular disease (CVD) and NAFLD. The aim of this study is to clarify whether elevated UHR is associated with the occurrence of AF in patients with NAFLD and to determine whether UHR predicted AF.MethodsPatients diagnosed with NAFLD in the Department of Cardiovascular Medicine of the Second Hospital of Shanxi Medical University from January 1, 2020, to December 31, 2021, were retrospectively enrolled in this study. The study subjects were categorized into AF group and non-AF group based on the presence or absence of combined AF. Logistic regression was performed to evaluate the correlation between UHR and AF. Sensitivity analysis and subgroup interaction analysis were performed to verify the robustness of the study results. Receiver operating characteristic (ROC) curve analysis was used to determine the optimal cutoff value for UHR to predict the development of AF in patients with NAFLD.ResultsA total of 421 patients with NAFLD were included, including 171 in the AF group and 250 in the non-AF group. In the univariate regression analysis, NAFLD patients with higher UHR were more likely to experience AF, and the risk of AF persisted after confounding factors were adjusted for (OR: 1.010, 95%CI: 1.007–1.013, P<0.001). AF risk increased with increasing UHR quartile (P for trend < 0.001). Despite normal serum UA and HDL-C, UHR was still connected with AF in patients with NAFLD. All subgroup variables did not interact significantly with UHR in the subgroup analysis. The ROC curve analysis showed that the areas under the curve for UA, HDL-C, and UHR were 0.702, 0.606, and 0.720, respectively, suggesting that UHR has a higher predictive value for AF occurrence in NAFLD patients compared to HDL-C or UA alone.ConclusionIncreased UHR level was independently correlated with a high risk of AF in NAFLD patients.
Data for New Aerosol Dry Deposition Model
catalog.data.gov
s.cnmilf.com
Updated Jul 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Data for New Aerosol Dry Deposition Model [Dataset]. https://catalog.data.gov/dataset/data-for-new-aerosol-dry-deposition-model
Explore at:
Dataset updated
Jul 24, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Fig1-needleleaf forest.txt contains all the observation data with each reference given for figure 1. The deposition velocity vd and diameter dp are shown in ordered arrays. vd_err and dp_err define the deposition velocity and diameter error bars. Fig 2-needleleaf.txt contains same observation data as Fig1-needleleaf forest.txt Fig3-Broadleaf forest.txt contains all the observation data with each reference given for broadleaf forests in Fig 3. Data format same as Fig1 Fig4-Grasst.txt contains all the observation data with each reference given for grass in Fig 4. Data format same as Fig1 Fig5.txt contains data from Zhang et al. 2014 for three different U* values Fig6-Watert.txt contains all the observation data with each reference given for water in Fig 6. Data format same as Fig1 DataFig7,TXT is a tab-deliminated text file containing the data in tabular for for Figure 7 DataFig8,TXT is a tab-deliminated text file containing the data in tabular for for Figure 8 Fig14a-133_P6p3_add_newadd_PM25_TOT_126719_boxplot_hourly_data.csv is a CSV file containing data for the hourly average median and 1st and 3rd quartiles of observation and two 1.33 km model runs that are represented by boxes in figure 14a. Fig14b-12US1_P6p3_add_PM25_TOT_211556_boxplot_hourly_data.csvis a CSV file containing data for the hourly average median and 1st and 3rd quartiles of observation and two 12 km model runs that are represented by boxes in Figure 14b. Fig15-133_P6p3_add_newadd_PM25_TOT_728997_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 1.33 km model runs and the differences in bias and error between the models at AQS sites Fig16-12US1_P6p3_add_PM25_TOT_971641_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 12 km model runs and the differences in bias and error between the models at AQS sites Fig17-12US1_P6p3_add_PM25_TOT_104554_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 12 km model runs and the differences in bias and error between the models at IMPROVE sites. Portions of this dataset are inaccessible because: Figs 9-13 are all plots directly from CMAQ output files which are far too large. They can be accessed through the following means: Can contact primary author, Jon Pleim, to access the data. Format: CMAQ netcdf output files
DOAC Reanalysis Dataset
zenodo.org
bin
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti; Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti (2024). DOAC Reanalysis Dataset [Dataset]. http://doi.org/10.5281/zenodo.13960575
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13960575
Dataset updated
Oct 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti; Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Welcome to the direct oral anticoagulant (DOAC) Reanalysis Dataset.

Sheet 1: Exact references to the FDA reviews from which we extracted all data points. You will also find links to the FDA drug approval packages, where one also finds all other published documents pertaining to the approvals, such as statistical reviews. In Sheet 1, we also cite the primary trial reports for each of the four pivotal DOAC trials.

Sheet 2: Basic overview of the 4 pivotal DOAC trials with an emphasis on time in therapeutic range (TTR) characteristics.

ISheet 3: Summary results from each of the 4 DOAC trials for the outcomes of stroke/systemic embolism, major bleed, and mortality (including outcome definitions from each trial).

Sheet 4: The full TTR dataset with outcomes stratified into quartiles (Q1 to Q4), including exact references to each data point in the FDA reviews.

Sheet 5: Q4 thresholds and conclusions in the industry TTR analyses.
g
Gender Pay Gaps in London | gimi9.com
gimi9.com
Updated Jun 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Gender Pay Gaps in London | gimi9.com [Dataset]. https://gimi9.com/dataset/london_gender-pay-gaps
Explore at:
Dataset updated
Jun 14, 2024
Area covered
London
Description
This dataset contains gender pay gap figures for all employees in London and large employers in London. The pay gap figures for GLA group organisations can be found on their respective websites. The gender pay gap is the difference in the average hourly wage of all men and women across a workforce. If women do more of the less well paid jobs within an organisation than men, the gender pay gap is usually bigger. The UK government publish gender pay gap figures for all employers with 250 or more employees. A cut of this dataset that only shows employers that are registered in London can be found below. Read a report by the Local Government Association (LGA) that summarises the mean and median pay gaps in local authorities, as well as the distribution of staff across pay quartiles. This dataset is one of the Greater London Authority's measures of Economic Fairness. Click here to find out more. This dataset is one of the Greater London Authority's measures of Economic Development strategy. Click here to find out more.
d
The Importance of Conference Proceedings in Research Evaluation: a...
elsevier.digitalcommonsdata.com
Updated Apr 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry Kochetkov (2020). The Importance of Conference Proceedings in Research Evaluation: a Methodology Based on Scimago Journal Rank (SJR) [Dataset]. http://doi.org/10.17632/hswn9y67rn.1
Explore at:
Unique identifier
https://doi.org/10.17632/hswn9y67rn.1
Dataset updated
Apr 22, 2020
Authors
Dmitry Kochetkov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Conferences are an essential tool for scientific communication. In disciplines such as Computer Science, over 50% of original research results are published in conference proceedings. In this dataset, there is is a list of conference proceedings, categorized Q1 - Q4 by analogy with SJR journal quartiles. We have analyzed the role of conference proceedings in various disciplines and propose an alternative approach to research evaluation based on conference proceedings and Scimago Journal Rank (SJR). Comparison of the resulting list in Computer Science with the CORE ranking showed a 62% match, as well as an average rank correlation of the distribution by category.
House price to workplace-based earnings ratio
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Mar 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). House price to workplace-based earnings ratio [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/ratioofhousepricetoworkplacebasedearningslowerquartileandmedian
Explore at:
xlsxAvailable download formats
Dataset updated
Mar 24, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
House price to residence-based earnings ratio
ons.gov.uk
cloud.csiss.gmu.edu
+2more
xlsx
Updated Mar 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). House price to residence-based earnings ratio [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/ratioofhousepricetoresidencebasedearningslowerquartileandmedian
Explore at:
xlsxAvailable download formats
Dataset updated
Mar 24, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
d
NODC Standard Product: International Ocean Atlas Volume 4 - Atlas of...
catalog.data.gov
search.dataone.org
+1more
Updated Aug 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). NODC Standard Product: International Ocean Atlas Volume 4 - Atlas of temperature / salinity frequency distributions (2 disc set) (NCEI Accession 0101473) [Dataset]. https://catalog.data.gov/dataset/nodc-standard-product-international-ocean-atlas-volume-4-atlas-of-temperature-salinity-frequenc1
Explore at:
Dataset updated
Aug 1, 2025
Dataset provided by
(Point of Contact)
Description
This Atlas presents more than 80,000 plots of the empirical frequency distributions of temperature and salinity for each 5-degree square area of the North Atlantic Ocean (80N to 30S) at all standard depth levels based on World Ocean Database 1998 data. Additional empirical statistical plots include the mean and standard deviation based on the arithmetic mean, the median and Median Absolute Deviation (MAD), winsorized estimates of the mean and standard deviation, quartiles, and skewness estimated from the quartiles. Some of these statistics are presented in both "normalized" and "natural" coordinates. Disc 1 contains seasonal distributions for the upper (0 m to 400 m) ocean. Disc 2 contains annual distributions for the deep (500 m - 5500 m) ocean.
34-year Daily Stock Data (1990-2024)
kaggle.com
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivesh Prakash (2024). 34-year Daily Stock Data (1990-2024) [Dataset]. https://www.kaggle.com/datasets/shiveshprakash/34-year-daily-stock-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shivesh Prakash
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Description: 34-year Daily Stock Data (1990-2024)

Context and Inspiration

This dataset captures historical financial market data and macroeconomic indicators spanning over three decades, from 1990 onwards. It is designed for financial analysis, time series forecasting, and exploring relationships between market volatility, stock indices, and macroeconomic factors. This dataset is particularly relevant for researchers, data scientists, and enthusiasts interested in studying: - Volatility forecasting (VIX) - Stock market trends (S&P 500, DJIA, HSI) - Macroeconomic influences on markets (joblessness, interest rates, etc.) - The effect of geopolitical and economic uncertainty (EPU, GPRD)

Sources

The data has been aggregated from a mix of historical financial records and publicly available macroeconomic datasets: - VIX (Volatility Index): Chicago Board Options Exchange (CBOE). - Stock Indices (S&P 500, DJIA, HSI): Yahoo Finance and historical financial databases. - Volume Data: Extracted from official exchange reports. - Macroeconomic Indicators: Bureau of Economic Analysis (BEA), Federal Reserve, and other public records. - Uncertainty Metrics (EPU, GPRD): Economic Policy Uncertainty Index and Global Policy Uncertainty Database.

Columns

dt: Date of observation in YYYY-MM-DD format.

vix: VIX (Volatility Index), a measure of expected market volatility.

sp500: S&P 500 index value, a benchmark of the U.S. stock market.

sp500_volume: Daily trading volume for the S&P 500.

djia: Dow Jones Industrial Average (DJIA), another key U.S. market index.

djia_volume: Daily trading volume for the DJIA.

hsi: Hang Seng Index, representing the Hong Kong stock market.

ads: Aruoba-Diebold-Scotti (ADS) Business Conditions Index, reflecting U.S. economic activity.

us3m: U.S. Treasury 3-month bond yield, a short-term interest rate proxy.

joblessness: U.S. unemployment rate, reported as quartiles (1 represents lowest quartile and so on).

epu: Economic Policy Uncertainty Index, quantifying policy-related economic uncertainty.

GPRD: Geopolitical Risk Index (Daily), measuring geopolitical risk levels.

prev_day: Previous day’s S&P 500 closing value, added for lag-based time series analysis.

Key Features

Cross-Market Analysis: Compare U.S. markets (S&P 500, DJIA) with international benchmarks like HSI.

Macroeconomic Insights: Assess how external factors like joblessness, interest rates, and economic uncertainty affect markets.

Temporal Scope: Longitudinal data facilitates trend analysis and machine learning model training.

Potential Use Cases

Forecasting market indices using machine learning or statistical models.

Building volatility trading strategies with VIX Futures.

Economic research on relationships between policy uncertainty and market behavior.

Educational material for financial data visualization and analysis tutorials.

Feel free to use this dataset for academic, research, or personal projects.
f
Descriptive statistics (minimum, first quartile, median, mean, third...
plos.figshare.com
xls
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Letícia F. M. Reis; Diego C. Nascimento; Paulo H. Ferreira; Francisco Louzada (2024). Descriptive statistics (minimum, first quartile, median, mean, third quartile, maximum) of probabilities for the RPDLomax and Logistic models by class (Wilt dataset). [Dataset]. http://doi.org/10.1371/journal.pone.0311246.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311246.t007
Dataset updated
Oct 16, 2024
Dataset provided by
PLOS ONE
Authors
Letícia F. M. Reis; Diego C. Nascimento; Paulo H. Ferreira; Francisco Louzada
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics (minimum, first quartile, median, mean, third quartile, maximum) of probabilities for the RPDLomax and Logistic models by class (Wilt dataset).
f
Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...
plos.figshare.com
xls
Updated Jun 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann (2023). Descriptive statistics of the 2 datasets with mean, standard deviation (SD), median, the lower (quantile 2.5%) and upper (quantile 97.5%) boundary of the 95% confidence interval, and the interquartile range IQR (quartile 75%—quartile 25%). [Dataset]. http://doi.org/10.1371/journal.pone.0282213.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0282213.t001
Dataset updated
Jun 18, 2023
Dataset provided by
PLOS ONE
Authors
Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
f
Dataset used and analysed for this study.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie A. Fernandez; Haoyang Sun; Borame L. Dickens; Lee Ching Ng; Alex R. Cook; Jue Tao Lim (2023). Dataset used and analysed for this study. [Dataset]. http://doi.org/10.1371/journal.pntd.0011075.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pntd.0011075.s001
Dataset updated
May 30, 2023
Dataset provided by
PLOS Neglected Tropical Diseases
Authors
Stephanie A. Fernandez; Haoyang Sun; Borame L. Dickens; Lee Ching Ng; Alex R. Cook; Jue Tao Lim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data collected for all 100 blocks assessed includes total number of units, number of units with more than 5 plants, number of units with more than 5 containers, corridor and public cleanliness rating, number of times out of the 10 public spots assessed that gully traps, open and covered drains and plants were present, median house price, year built and abundance status. (XLSX)
f
Long Covid Risk
figshare.com
txt
Updated Apr 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Shaheen (2024). Long Covid Risk [Dataset]. http://doi.org/10.6084/m9.figshare.25599591.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25599591.v1
Dataset updated
Apr 13, 2024
Dataset provided by
figshare
Authors
Ahmed Shaheen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Feature preparation Preprocessing was applied to the data, such as creating dummy variables and performing transformations (centering, scaling, YeoJohnson) using the preProcess() function from the “caret” package in R. The correlation among the variables was examined and no serious multicollinearity problems were found. A stepwise variable selection was performed using a logistic regression model. The final set of variables included: Demographic: age, body mass index, sex, ethnicity, smoking History of disease: heart disease, migraine, insomnia, gastrointestinal disease, COVID-19 history: covid vaccination, rashes, conjunctivitis, shortness of breath, chest pain, cough, runny nose, dysgeusia, muscle and joint pain, fatigue, fever ,COVID-19 reinfection, and ICU admission. These variables were used to train and test various machine-learning models Model selection and training The data was randomly split into 80% training and 20% testing subsets. The “h2o” package in R version 4.3.1 was employed to implement different algorithms. AutoML was first used, which automatically explored a range of models with different configurations. Gradient Boosting Machines (GBM), Random Forest (RF), and Regularized Generalized Linear Model (GLM) were identified as the best-performing models on our data and their parameters were fine-tuned. An ensemble method that stacked different models together was also used, as it could sometimes improve the accuracy. The models were evaluated using the area under the curve (AUC) and C-statistics as diagnostic measures. The model with the highest AUC was selected for further analysis using the confusion matrix, accuracy, sensitivity, specificity, and F1 and F2 scores. The optimal prediction threshold was determined by plotting the sensitivity, specificity, and accuracy and choosing the point of intersection as it balanced the trade-off between the three metrics. The model’s predictions were also plotted, and the quantile ranges were used to classify the model’s prediction as follows: > 1st quantile, > 2nd quantile, > 3rd quartile and < 3rd quartile (very low, low, moderate, high) respectively. Metric Formula C-statistics (TPR + TNR - 1) / 2 Sensitivity/Recall TP / (TP + FN) Specificity TN / (TN + FP) Accuracy (TP + TN) / (TP + TN + FP + FN) F1 score 2 * (precision * recall) / (precision + recall) Model interpretation We used the variable importance plot, which is a measure of how much each variable contributes to the prediction power of a machine learning model. In H2O package, variable importance for GBM and RF is calculated by measuring the decrease in the model's error when a variable is split on. The more a variable's split decreases the error, the more important that variable is considered to be. The error is calculated using the following formula: 𝑆𝐸=𝑀𝑆𝐸∗𝑁=𝑉𝐴𝑅∗𝑁 and then it is scaled between 0 and 1 and plotted. Also, we used The SHAP summary plot which is a graphical tool to visualize the impact of input features on the prediction of a machine learning model. SHAP stands for SHapley Additive exPlanations, a method to calculate the contribution of each feature to the prediction by averaging over all possible subsets of features [28]. SHAP summary plot shows the distribution of the SHAP values for each feature across the data instances. We use the h2o.shap_summary_plot() function in R to generate the SHAP summary plot for our GBM model. We pass the model object and the test data as arguments, and optionally specify the columns (features) we want to include in the plot. The plot shows the SHAP values for each feature on the x-axis, and the features on the y-axis. The color indicates whether the feature value is low (blue) or high (red). The plot also shows the distribution of the feature values as a density plot on the right.
f
Summaries and cumulative COVID-19 metrics by statewide HPI quartile from Feb...
figshare.com
plos.figshare.com
xls
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ada T. Kwan; Jason Vargo; Caroline Kurtz; Mayuri Panditrao; Christopher M. Hoover; Tomás M. León; David Rocha; William Wheeler; Seema Jain; Erica S. Pan; Priya B. Shete (2025). Summaries and cumulative COVID-19 metrics by statewide HPI quartile from Feb 1, 2020 through Jun 30, 2021. [Dataset]. http://doi.org/10.1371/journal.pone.0316517.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316517.t004
Dataset updated
Mar 6, 2025
Dataset provided by
PLOS ONE
Authors
Ada T. Kwan; Jason Vargo; Caroline Kurtz; Mayuri Panditrao; Christopher M. Hoover; Tomás M. León; David Rocha; William Wheeler; Seema Jain; Erica S. Pan; Priya B. Shete
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
By statewide vaccine equity metric (VEM) quartiles.
Quartiles of the cell-specific probability of tuberculosis transmission.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nithinan Mahawan; Thanapoom Rattananupong; Puchong Sri-Uam; Wiroj Jiamjarasrangsi (2024). Quartiles of the cell-specific probability of tuberculosis transmission. [Dataset]. http://doi.org/10.1371/journal.pone.0305264.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305264.t002
Dataset updated
Jul 19, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Nithinan Mahawan; Thanapoom Rattananupong; Puchong Sri-Uam; Wiroj Jiamjarasrangsi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Quartiles of the cell-specific probability of tuberculosis transmission.

Facebook

Twitter

Click to copy link

Link copied

Cite

OSPI (2025). Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24 [Dataset]. https://data.wa.gov/resource/fhnj-yqpr

Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24

Explore at:

csv, json, application/rssxml, xml, tsv, application/rdfxmlAvailable download formats

Dataset updated

Jan 16, 2025

Dataset authored and provided by

OSPI

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.

Clear search

Close search

Google apps

Main menu

Report Card Administrators by School Poverty Quartile School Years 2017-18...

COVID-19 Vaccine Progress Dashboard Data by ZIP Code

3rd quartile of the equivalent disposable administrative income of couples...

Relative uniqueness by quartile with Z tests.

Gender, Age, and Emotion Detection from Voice

Context

Content

Acknowledgements

Correlation between UHR quartiles and AF.

Data for New Aerosol Dry Deposition Model

DOAC Reanalysis Dataset

Gender Pay Gaps in London | gimi9.com

The Importance of Conference Proceedings in Research Evaluation: a...

House price to workplace-based earnings ratio

House price to residence-based earnings ratio

NODC Standard Product: International Ocean Atlas Volume 4 - Atlas of...

34-year Daily Stock Data (1990-2024)

Dataset Description: 34-year Daily Stock Data (1990-2024)

Context and Inspiration

Sources

Columns

Key Features

Potential Use Cases

Descriptive statistics (minimum, first quartile, median, mean, third...

Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...

Dataset used and analysed for this study.

Long Covid Risk

Summaries and cumulative COVID-19 metrics by statewide HPI quartile from Feb...

Quartiles of the cell-specific probability of tuberculosis transmission.

Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24