23 datasets found
  1. D

    Report Card Administrators by School Poverty Quartile School Years 2017-18...

    • data.wa.gov
    • catalog.data.gov
    application/rdfxml +5
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OSPI (2025). Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24 [Dataset]. https://data.wa.gov/resource/fhnj-yqpr
    Explore at:
    csv, json, application/rssxml, xml, tsv, application/rdfxmlAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    OSPI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.

  2. COVID-19 Vaccine Progress Dashboard Data by ZIP Code

    • data.ca.gov
    • data.chhs.ca.gov
    • +2more
    csv, xlsx, zip
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data by ZIP Code [Dataset]. https://data.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code
    Explore at:
    csv, xlsx, zipAvailable download formats
    Dataset updated
    Aug 9, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

    Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.

    Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.

    This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.

    This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.

    This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.

    The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.

    These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

    For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.

  3. 3rd quartile of the equivalent disposable administrative income of couples...

    • data.europa.eu
    csv, json
    Updated Jul 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IWEPS (2025). 3rd quartile of the equivalent disposable administrative income of couples with at least one spouse aged 65 or over [Dataset]. https://data.europa.eu/88u/dataset/831110-50
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    Walloon Institute for Evaluation, Prospective Studies and Statistics
    Authors
    IWEPS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Administrative disposable income is a third pillar of the income statistics that Statbel publishes, alongside "\2" and poverty indicators based on "\2", and allows answering other types of questions than SILC and tax statistics.

    SILC uses "\2" at the household level as a concept of income, cumulating the incomes of all household members. In the next step, this disposable income is converted into equivalised disposable income to take into account the composition of the household. Based on the SILC, at-risk-of-poverty figures are published up to the provincial level. However, the sample size does not allow for analyses at a more detailed geographical level. However, statistics based on tax revenues are available up to the level of the statistical sector, but are limited to taxable income in the context of personal income tax returns. Non-taxable income is not taken into account and there is also no correction according to the composition of the household.

    The variable "administrative equivalised disposable income" responds to a growing demand for income and poverty figures at the communal level. It uses an income concept based on administrative sources that tries to correspond as much as possible to that of SILC. For the population as a whole, both taxable and non-taxable income are taken into account. They are added together for all members of the household in order to obtain an administrative disposable income for the household. After adjusting for the composition of the household, the variable "administrative equivalised disposable income" is established. This can be used to calculate income and poverty figures at the communal level.

    Indicators are not disseminated for an entity and a category when there are at least 15% of people whose equivalent administrative disposable income is missing or when there are less than 100 people with a valid income.

    More information on the page "\2" of Statbel

  4. f

    Relative uniqueness by quartile with Z tests.

    • plos.figshare.com
    xls
    Updated Jul 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean MacNiven; Ralph Tench (2024). Relative uniqueness by quartile with Z tests. [Dataset]. http://doi.org/10.1371/journal.pone.0305568.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Sean MacNiven; Ralph Tench
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the phenomena of semantic drift through the lenses of language and situated simulation (LASS) and the word frequency effect (WFE) within a timed word association task. Our primary objectives were to determine whether semantic drift can be identified over the short time (25 seconds) of a free word association task (a predicted corollary of LASS), and whether more frequent terms are generated earlier in the process (as expected due to the WFE). Respondents were provided with five cue words (tree, dog, quality, plastic and love), and asked to write as many associations as they could. We hypothesized that terms generated later in the task (fourth time quartile, the last 19–25 seconds) would be semantically more distant (cosine similarity) from the cue word than those generated earlier (first quartile, the first 1–7 seconds), indicating semantic drift. Additionally, we explored the WFE by hypothesizing that earlier generated words would be more frequent and less diverse. Utilizing a dataset matched with GloVe 300B word embeddings, BERT and WordNet synsets, we analysed semantic distances among 1569 unique term pairs for all cue words across time. Our results supported the presence of semantic drift, with significant evidence of within-participant, semantic drift from the first to fourth time (LASS) and frequency (WFE) quartiles. In terms of the WFE, we observed a notable decrease in the diversity of terms generated earlier in the task, while more unique terms (greater diversity and relative uniqueness) were generated in the 4th time quartile, aligning with our hypothesis that more frequently used words dominate early stages of a word association task. We also found that the size of effects varied substantially across cues, suggesting that some cues might invoke stronger and more idiosyncratic situated simulations. Theoretically, our study contributes to the understanding of LASS and the WFE. It suggests that semantic drift might serve as a scalable indicator of the invocation of language versus simulation systems in LASS and might also be used to explore cognition within word association tasks more generally. The findings also add a temporal and relational dimension to the WFE. Practically, our research highlights the utility of word association tasks in understanding semantic drift and the diffusion of word usage over a sub-minute task, arguably the shortest practically feasible timeframe, offering a scalable method to explore group and individual changes in semantic relationships, whether via the targeted diffusion of influence in a marketing campaign, or seeking to understand differences in cognition more generally. Possible practical uses and opportunities for future research are discussed.

  5. Gender, Age, and Emotion Detection from Voice

    • kaggle.com
    Updated May 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/datasets/rohitzaman/gender-age-and-emotion-detection-from-voice/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rohit Zaman
    Description

    Context

    Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

    Content

    Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

    Acknowledgements

    Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/

  6. f

    Correlation between UHR quartiles and AF.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhao, Jianqi; Bai, Rui; Liu, Gaizhen; Song, Xiaosu; Zhou, Meng; Zhang, Qi; Qin, Weiwei; Zhang, Yonglai; Li, Baojie (2024). Correlation between UHR quartiles and AF. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001456462
    Explore at:
    Dataset updated
    Jun 24, 2024
    Authors
    Zhao, Jianqi; Bai, Rui; Liu, Gaizhen; Song, Xiaosu; Zhou, Meng; Zhang, Qi; Qin, Weiwei; Zhang, Yonglai; Li, Baojie
    Description

    BackgroundNon-alcoholic fatty liver disease (NAFLD) is independently associated with atrial fibrillation (AF) risk. The uric acid (UA) to high-density lipoprotein cholesterol (HDL-C) ratio (UHR) has been shown to be closely associated with cardiovascular disease (CVD) and NAFLD. The aim of this study is to clarify whether elevated UHR is associated with the occurrence of AF in patients with NAFLD and to determine whether UHR predicted AF.MethodsPatients diagnosed with NAFLD in the Department of Cardiovascular Medicine of the Second Hospital of Shanxi Medical University from January 1, 2020, to December 31, 2021, were retrospectively enrolled in this study. The study subjects were categorized into AF group and non-AF group based on the presence or absence of combined AF. Logistic regression was performed to evaluate the correlation between UHR and AF. Sensitivity analysis and subgroup interaction analysis were performed to verify the robustness of the study results. Receiver operating characteristic (ROC) curve analysis was used to determine the optimal cutoff value for UHR to predict the development of AF in patients with NAFLD.ResultsA total of 421 patients with NAFLD were included, including 171 in the AF group and 250 in the non-AF group. In the univariate regression analysis, NAFLD patients with higher UHR were more likely to experience AF, and the risk of AF persisted after confounding factors were adjusted for (OR: 1.010, 95%CI: 1.007–1.013, P<0.001). AF risk increased with increasing UHR quartile (P for trend < 0.001). Despite normal serum UA and HDL-C, UHR was still connected with AF in patients with NAFLD. All subgroup variables did not interact significantly with UHR in the subgroup analysis. The ROC curve analysis showed that the areas under the curve for UA, HDL-C, and UHR were 0.702, 0.606, and 0.720, respectively, suggesting that UHR has a higher predictive value for AF occurrence in NAFLD patients compared to HDL-C or UA alone.ConclusionIncreased UHR level was independently correlated with a high risk of AF in NAFLD patients.

  7. Data for New Aerosol Dry Deposition Model

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jul 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Data for New Aerosol Dry Deposition Model [Dataset]. https://catalog.data.gov/dataset/data-for-new-aerosol-dry-deposition-model
    Explore at:
    Dataset updated
    Jul 24, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Fig1-needleleaf forest.txt contains all the observation data with each reference given for figure 1. The deposition velocity vd and diameter dp are shown in ordered arrays. vd_err and dp_err define the deposition velocity and diameter error bars. Fig 2-needleleaf.txt contains same observation data as Fig1-needleleaf forest.txt Fig3-Broadleaf forest.txt contains all the observation data with each reference given for broadleaf forests in Fig 3. Data format same as Fig1 Fig4-Grasst.txt contains all the observation data with each reference given for grass in Fig 4. Data format same as Fig1 Fig5.txt contains data from Zhang et al. 2014 for three different U* values Fig6-Watert.txt contains all the observation data with each reference given for water in Fig 6. Data format same as Fig1 DataFig7,TXT is a tab-deliminated text file containing the data in tabular for for Figure 7 DataFig8,TXT is a tab-deliminated text file containing the data in tabular for for Figure 8 Fig14a-133_P6p3_add_newadd_PM25_TOT_126719_boxplot_hourly_data.csv is a CSV file containing data for the hourly average median and 1st and 3rd quartiles of observation and two 1.33 km model runs that are represented by boxes in figure 14a. Fig14b-12US1_P6p3_add_PM25_TOT_211556_boxplot_hourly_data.csvis a CSV file containing data for the hourly average median and 1st and 3rd quartiles of observation and two 12 km model runs that are represented by boxes in Figure 14b. Fig15-133_P6p3_add_newadd_PM25_TOT_728997_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 1.33 km model runs and the differences in bias and error between the models at AQS sites Fig16-12US1_P6p3_add_PM25_TOT_971641_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 12 km model runs and the differences in bias and error between the models at AQS sites Fig17-12US1_P6p3_add_PM25_TOT_104554_spatialplot_diff.csv is a CSV file containing all the data for the bias and error for NEW and BASE 12 km model runs and the differences in bias and error between the models at IMPROVE sites. Portions of this dataset are inaccessible because: Figs 9-13 are all plots directly from CMAQ output files which are far too large. They can be accessed through the following means: Can contact primary author, Jon Pleim, to access the data. Format: CMAQ netcdf output files

  8. DOAC Reanalysis Dataset

    • zenodo.org
    bin
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti; Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti (2024). DOAC Reanalysis Dataset [Dataset]. http://doi.org/10.5281/zenodo.13960575
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti; Kim Boesen; Luis Carlos Saiz; Peter C Gøtzsche; Juan Erviti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Welcome to the direct oral anticoagulant (DOAC) Reanalysis Dataset.

    Sheet 1: Exact references to the FDA reviews from which we extracted all data points. You will also find links to the FDA drug approval packages, where one also finds all other published documents pertaining to the approvals, such as statistical reviews. In Sheet 1, we also cite the primary trial reports for each of the four pivotal DOAC trials.

    Sheet 2: Basic overview of the 4 pivotal DOAC trials with an emphasis on time in therapeutic range (TTR) characteristics.

    ISheet 3: Summary results from each of the 4 DOAC trials for the outcomes of stroke/systemic embolism, major bleed, and mortality (including outcome definitions from each trial).

    Sheet 4: The full TTR dataset with outcomes stratified into quartiles (Q1 to Q4), including exact references to each data point in the FDA reviews.

    Sheet 5: Q4 thresholds and conclusions in the industry TTR analyses.

  9. g

    Gender Pay Gaps in London | gimi9.com

    • gimi9.com
    Updated Jun 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Gender Pay Gaps in London | gimi9.com [Dataset]. https://gimi9.com/dataset/london_gender-pay-gaps
    Explore at:
    Dataset updated
    Jun 14, 2024
    Area covered
    London
    Description

    This dataset contains gender pay gap figures for all employees in London and large employers in London. The pay gap figures for GLA group organisations can be found on their respective websites. The gender pay gap is the difference in the average hourly wage of all men and women across a workforce. If women do more of the less well paid jobs within an organisation than men, the gender pay gap is usually bigger. The UK government publish gender pay gap figures for all employers with 250 or more employees. A cut of this dataset that only shows employers that are registered in London can be found below. Read a report by the Local Government Association (LGA) that summarises the mean and median pay gaps in local authorities, as well as the distribution of staff across pay quartiles. This dataset is one of the Greater London Authority's measures of Economic Fairness. Click here to find out more. This dataset is one of the Greater London Authority's measures of Economic Development strategy. Click here to find out more.

  10. d

    The Importance of Conference Proceedings in Research Evaluation: a...

    • elsevier.digitalcommonsdata.com
    Updated Apr 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dmitry Kochetkov (2020). The Importance of Conference Proceedings in Research Evaluation: a Methodology Based on Scimago Journal Rank (SJR) [Dataset]. http://doi.org/10.17632/hswn9y67rn.1
    Explore at:
    Dataset updated
    Apr 22, 2020
    Authors
    Dmitry Kochetkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Conferences are an essential tool for scientific communication. In disciplines such as Computer Science, over 50% of original research results are published in conference proceedings. In this dataset, there is is a list of conference proceedings, categorized Q1 - Q4 by analogy with SJR journal quartiles. We have analyzed the role of conference proceedings in various disciplines and propose an alternative approach to research evaluation based on conference proceedings and Scimago Journal Rank (SJR). Comparison of the resulting list in Computer Science with the CORE ranking showed a 62% match, as well as an average rank correlation of the distribution by category.

  11. House price to workplace-based earnings ratio

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). House price to workplace-based earnings ratio [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/ratioofhousepricetoworkplacebasedearningslowerquartileandmedian
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.

  12. House price to residence-based earnings ratio

    • ons.gov.uk
    • cloud.csiss.gmu.edu
    • +2more
    xlsx
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). House price to residence-based earnings ratio [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/ratioofhousepricetoresidencebasedearningslowerquartileandmedian
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.

  13. d

    NODC Standard Product: International Ocean Atlas Volume 4 - Atlas of...

    • catalog.data.gov
    • search.dataone.org
    • +1more
    Updated Aug 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). NODC Standard Product: International Ocean Atlas Volume 4 - Atlas of temperature / salinity frequency distributions (2 disc set) (NCEI Accession 0101473) [Dataset]. https://catalog.data.gov/dataset/nodc-standard-product-international-ocean-atlas-volume-4-atlas-of-temperature-salinity-frequenc1
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset provided by
    (Point of Contact)
    Description

    This Atlas presents more than 80,000 plots of the empirical frequency distributions of temperature and salinity for each 5-degree square area of the North Atlantic Ocean (80N to 30S) at all standard depth levels based on World Ocean Database 1998 data. Additional empirical statistical plots include the mean and standard deviation based on the arithmetic mean, the median and Median Absolute Deviation (MAD), winsorized estimates of the mean and standard deviation, quartiles, and skewness estimated from the quartiles. Some of these statistics are presented in both "normalized" and "natural" coordinates. Disc 1 contains seasonal distributions for the upper (0 m to 400 m) ocean. Disc 2 contains annual distributions for the deep (500 m - 5500 m) ocean.

  14. 34-year Daily Stock Data (1990-2024)

    • kaggle.com
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivesh Prakash (2024). 34-year Daily Stock Data (1990-2024) [Dataset]. https://www.kaggle.com/datasets/shiveshprakash/34-year-daily-stock-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shivesh Prakash
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description: 34-year Daily Stock Data (1990-2024)

    Context and Inspiration

    This dataset captures historical financial market data and macroeconomic indicators spanning over three decades, from 1990 onwards. It is designed for financial analysis, time series forecasting, and exploring relationships between market volatility, stock indices, and macroeconomic factors. This dataset is particularly relevant for researchers, data scientists, and enthusiasts interested in studying: - Volatility forecasting (VIX) - Stock market trends (S&P 500, DJIA, HSI) - Macroeconomic influences on markets (joblessness, interest rates, etc.) - The effect of geopolitical and economic uncertainty (EPU, GPRD)

    Sources

    The data has been aggregated from a mix of historical financial records and publicly available macroeconomic datasets: - VIX (Volatility Index): Chicago Board Options Exchange (CBOE). - Stock Indices (S&P 500, DJIA, HSI): Yahoo Finance and historical financial databases. - Volume Data: Extracted from official exchange reports. - Macroeconomic Indicators: Bureau of Economic Analysis (BEA), Federal Reserve, and other public records. - Uncertainty Metrics (EPU, GPRD): Economic Policy Uncertainty Index and Global Policy Uncertainty Database.

    Columns

    1. dt: Date of observation in YYYY-MM-DD format.
    2. vix: VIX (Volatility Index), a measure of expected market volatility.
    3. sp500: S&P 500 index value, a benchmark of the U.S. stock market.
    4. sp500_volume: Daily trading volume for the S&P 500.
    5. djia: Dow Jones Industrial Average (DJIA), another key U.S. market index.
    6. djia_volume: Daily trading volume for the DJIA.
    7. hsi: Hang Seng Index, representing the Hong Kong stock market.
    8. ads: Aruoba-Diebold-Scotti (ADS) Business Conditions Index, reflecting U.S. economic activity.
    9. us3m: U.S. Treasury 3-month bond yield, a short-term interest rate proxy.
    10. joblessness: U.S. unemployment rate, reported as quartiles (1 represents lowest quartile and so on).
    11. epu: Economic Policy Uncertainty Index, quantifying policy-related economic uncertainty.
    12. GPRD: Geopolitical Risk Index (Daily), measuring geopolitical risk levels.
    13. prev_day: Previous day’s S&P 500 closing value, added for lag-based time series analysis.

    Key Features

    • Cross-Market Analysis: Compare U.S. markets (S&P 500, DJIA) with international benchmarks like HSI.
    • Macroeconomic Insights: Assess how external factors like joblessness, interest rates, and economic uncertainty affect markets.
    • Temporal Scope: Longitudinal data facilitates trend analysis and machine learning model training.

    Potential Use Cases

    • Forecasting market indices using machine learning or statistical models.
    • Building volatility trading strategies with VIX Futures.
    • Economic research on relationships between policy uncertainty and market behavior.
    • Educational material for financial data visualization and analysis tutorials.

    Feel free to use this dataset for academic, research, or personal projects.

  15. f

    Descriptive statistics (minimum, first quartile, median, mean, third...

    • plos.figshare.com
    xls
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Letícia F. M. Reis; Diego C. Nascimento; Paulo H. Ferreira; Francisco Louzada (2024). Descriptive statistics (minimum, first quartile, median, mean, third quartile, maximum) of probabilities for the RPDLomax and Logistic models by class (Wilt dataset). [Dataset]. http://doi.org/10.1371/journal.pone.0311246.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Letícia F. M. Reis; Diego C. Nascimento; Paulo H. Ferreira; Francisco Louzada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Descriptive statistics (minimum, first quartile, median, mean, third quartile, maximum) of probabilities for the RPDLomax and Logistic models by class (Wilt dataset).

  16. f

    Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...

    • plos.figshare.com
    xls
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann (2023). Descriptive statistics of the 2 datasets with mean, standard deviation (SD), median, the lower (quantile 2.5%) and upper (quantile 97.5%) boundary of the 95% confidence interval, and the interquartile range IQR (quartile 75%—quartile 25%). [Dataset]. http://doi.org/10.1371/journal.pone.0282213.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.

  17. f

    Dataset used and analysed for this study.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie A. Fernandez; Haoyang Sun; Borame L. Dickens; Lee Ching Ng; Alex R. Cook; Jue Tao Lim (2023). Dataset used and analysed for this study. [Dataset]. http://doi.org/10.1371/journal.pntd.0011075.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS Neglected Tropical Diseases
    Authors
    Stephanie A. Fernandez; Haoyang Sun; Borame L. Dickens; Lee Ching Ng; Alex R. Cook; Jue Tao Lim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data collected for all 100 blocks assessed includes total number of units, number of units with more than 5 plants, number of units with more than 5 containers, corridor and public cleanliness rating, number of times out of the 10 public spots assessed that gully traps, open and covered drains and plants were present, median house price, year built and abundance status. (XLSX)

  18. f

    Long Covid Risk

    • figshare.com
    txt
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shaheen (2024). Long Covid Risk [Dataset]. http://doi.org/10.6084/m9.figshare.25599591.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 13, 2024
    Dataset provided by
    figshare
    Authors
    Ahmed Shaheen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Feature preparation Preprocessing was applied to the data, such as creating dummy variables and performing transformations (centering, scaling, YeoJohnson) using the preProcess() function from the “caret” package in R. The correlation among the variables was examined and no serious multicollinearity problems were found. A stepwise variable selection was performed using a logistic regression model. The final set of variables included: Demographic: age, body mass index, sex, ethnicity, smoking History of disease: heart disease, migraine, insomnia, gastrointestinal disease, COVID-19 history: covid vaccination, rashes, conjunctivitis, shortness of breath, chest pain, cough, runny nose, dysgeusia, muscle and joint pain, fatigue, fever ,COVID-19 reinfection, and ICU admission. These variables were used to train and test various machine-learning models Model selection and training The data was randomly split into 80% training and 20% testing subsets. The “h2o” package in R version 4.3.1 was employed to implement different algorithms. AutoML was first used, which automatically explored a range of models with different configurations. Gradient Boosting Machines (GBM), Random Forest (RF), and Regularized Generalized Linear Model (GLM) were identified as the best-performing models on our data and their parameters were fine-tuned. An ensemble method that stacked different models together was also used, as it could sometimes improve the accuracy. The models were evaluated using the area under the curve (AUC) and C-statistics as diagnostic measures. The model with the highest AUC was selected for further analysis using the confusion matrix, accuracy, sensitivity, specificity, and F1 and F2 scores. The optimal prediction threshold was determined by plotting the sensitivity, specificity, and accuracy and choosing the point of intersection as it balanced the trade-off between the three metrics. The model’s predictions were also plotted, and the quantile ranges were used to classify the model’s prediction as follows: > 1st quantile, > 2nd quantile, > 3rd quartile and < 3rd quartile (very low, low, moderate, high) respectively. Metric Formula C-statistics (TPR + TNR - 1) / 2 Sensitivity/Recall TP / (TP + FN) Specificity TN / (TN + FP) Accuracy (TP + TN) / (TP + TN + FP + FN) F1 score 2 * (precision * recall) / (precision + recall) Model interpretation We used the variable importance plot, which is a measure of how much each variable contributes to the prediction power of a machine learning model. In H2O package, variable importance for GBM and RF is calculated by measuring the decrease in the model's error when a variable is split on. The more a variable's split decreases the error, the more important that variable is considered to be. The error is calculated using the following formula: 𝑆𝐸=𝑀𝑆𝐸∗𝑁=𝑉𝐴𝑅∗𝑁 and then it is scaled between 0 and 1 and plotted. Also, we used The SHAP summary plot which is a graphical tool to visualize the impact of input features on the prediction of a machine learning model. SHAP stands for SHapley Additive exPlanations, a method to calculate the contribution of each feature to the prediction by averaging over all possible subsets of features [28]. SHAP summary plot shows the distribution of the SHAP values for each feature across the data instances. We use the h2o.shap_summary_plot() function in R to generate the SHAP summary plot for our GBM model. We pass the model object and the test data as arguments, and optionally specify the columns (features) we want to include in the plot. The plot shows the SHAP values for each feature on the x-axis, and the features on the y-axis. The color indicates whether the feature value is low (blue) or high (red). The plot also shows the distribution of the feature values as a density plot on the right.

  19. f

    Summaries and cumulative COVID-19 metrics by statewide HPI quartile from Feb...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ada T. Kwan; Jason Vargo; Caroline Kurtz; Mayuri Panditrao; Christopher M. Hoover; Tomás M. León; David Rocha; William Wheeler; Seema Jain; Erica S. Pan; Priya B. Shete (2025). Summaries and cumulative COVID-19 metrics by statewide HPI quartile from Feb 1, 2020 through Jun 30, 2021. [Dataset]. http://doi.org/10.1371/journal.pone.0316517.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ada T. Kwan; Jason Vargo; Caroline Kurtz; Mayuri Panditrao; Christopher M. Hoover; Tomás M. León; David Rocha; William Wheeler; Seema Jain; Erica S. Pan; Priya B. Shete
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    By statewide vaccine equity metric (VEM) quartiles.

  20. Quartiles of the cell-specific probability of tuberculosis transmission.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nithinan Mahawan; Thanapoom Rattananupong; Puchong Sri-Uam; Wiroj Jiamjarasrangsi (2024). Quartiles of the cell-specific probability of tuberculosis transmission. [Dataset]. http://doi.org/10.1371/journal.pone.0305264.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nithinan Mahawan; Thanapoom Rattananupong; Puchong Sri-Uam; Wiroj Jiamjarasrangsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quartiles of the cell-specific probability of tuberculosis transmission.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OSPI (2025). Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24 [Dataset]. https://data.wa.gov/resource/fhnj-yqpr

Report Card Administrators by School Poverty Quartile School Years 2017-18 to 2023-24

Explore at:
csv, json, application/rssxml, xml, tsv, application/rdfxmlAvailable download formats
Dataset updated
Jan 16, 2025
Dataset authored and provided by
OSPI
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.

Search
Clear search
Close search
Google apps
Main menu