16 datasets found
  1. Gender, Age, and Emotion Detection from Voice

    • kaggle.com
    zip
    Updated May 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/rohitzaman/gender-age-and-emotion-detection-from-voice
    Explore at:
    zip(967820 bytes)Available download formats
    Dataset updated
    May 29, 2021
    Authors
    Rohit Zaman
    Description

    Context

    Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

    Content

    Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

    Acknowledgements

    Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/

  2. e

    Share of income tax returns from 40.001 up to EUR 50.000

    • data.europa.eu
    csv, json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IWEPS, Share of income tax returns from 40.001 up to EUR 50.000 [Dataset]. http://data.europa.eu/88u/dataset/831100-6
    Explore at:
    csv, jsonAvailable download formats
    Dataset authored and provided by
    IWEPS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tax statistics are compiled on the basis of personal tax returns at the place of residence. The income year is the year for which taxes are due.Total taxable net income consists of all net professional income, net real estate income, net movable income and miscellaneous net income.

    To measure the dispersal of income distribution, tax returns are classified in ascending order of income and divided into 4 equal parts separated by 3 quartiles (Q1:25 % of the returns have income less than Q1, Q2 = median income: 50 % of returns have income less than Q2, Q3= 75 % of returns have income less than Q3). Tax returns with zero taxable income are not included in the calculations. The indicator reports the difference between the 3 rd and 1st quartile to the median: (Q3-Q1)/Q2.The higher the interquartile coefficient, the higher the degree of income inequality. As it refers to the median value, it makes it possible to compare the dispersion of series with very different median values. The income year is the year for which taxes are due. Total taxable net income consists of all net professional income, net real estate income, net movable income and miscellaneous net income.

    To measure the dispersal of income distribution, tax returns are classified in ascending order of income and divided into 4 equal parts separated by 3 quartiles (Q1: 25 % of the returns have income less than Q1, Q2 = median income: 50 % of returns have income less than Q2, Q3= 75 % of returns have income less than Q3). Tax returns with zero taxable income are not included in the calculations.

    The indicator reports the difference between the 3 rd and 1st quartile to the median: (Q3-Q1)/Q2. The higher the interquartile coefficient, the higher the degree of income inequality. As it refers to the median value, it makes it possible to compare the dispersion of series with very different median values.

  3. Tonal languages from mozilla common voice 10

    • kaggle.com
    zip
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enrique Díaz-Ocampo (2023). Tonal languages from mozilla common voice 10 [Dataset]. https://www.kaggle.com/datasets/enriquedazocampo/tonal-languages-mozilla-common-voice
    Explore at:
    zip(23999125 bytes)Available download formats
    Dataset updated
    Jan 4, 2023
    Authors
    Enrique Díaz-Ocampo
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The following dataset is intended to be used for gender recognition using audio files in uncontrolled environments from the Mozilla Common Voice Dataset 10.0. It consists of a table of descriptive statistical characteristics of the fundamental frequency of six tonal languages Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Thai, Vietnamese, and Punjabi. In addition, the estimation of the vocal tract of each of the speakers.

    This dataset contains 18 columns: 'client_id': id speaker from Mozilla Common Voice 'path': Name of the mp3 file 'sentence': The sentence spoken by the speaker 'age': Age in decades (teens, twenties, etc.) 'gender': Binary gender (male or female) 'duration': Duration of mp3 in seconds 'vocal_tract_length': Vocal tract length in cm. 'mean_F4': Mean of the fourth formant in Hz. 'min_pitch': Minimal pitch of the whole pitch contour in Hz. 'mean_pitch': Mean pitch of the whole pitch contour in Hz. 'q1_pitch': : First quartile of the whole pitch contour in Hz. 'median_pitch': : Median pitch of the whole pitch contour Hz. 'q3_pitch': : Third quartile of the whole pitch contour in Hz. 'max_pitch': : Max pitch of the whole pitch contour in Hz. 'stddev_pitch' : Standard deviation of the whole pitch contour in Hz. 'estimated_age': Nominal value (adult or teen) 'estimated_age_gender: Nominal value (adult-male, adult-female, teen-male and teen-female). 'language': Nominal value (Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Thai, Vietnamese, and Punjabi).

    The methodology for the extraction of these characteristics was the following:

    Only the audios from the valid.tsv file of the respective language were analyzed (this file is contained in the Mozilla Common Voice Dataset https://commonvoice.mozilla.org/en/datasets ) the voiced-speech was extracted using Praat's algorithm Vocal ToolKit (https://www.praatvocaltoolkit.com/extract-voiced-and-unvoiced.html)

    2) The vocal tract length was calculated with the Vocal Tool Kit algorithm ( https://www.praatvocaltoolkit.com/calculate-vocal-tract-length.html ) as follows: If the audio came from a teen, then the maximum formant was established at 8000, otherwise it was adjusted to 5000 Hz for men and 5500 for women. Finally, the mean of the fourth formant was calculated for the windows with voiced speech only.

    3) The fundamental frequency was calculated using the PRAAT Software in the To Pitch (ac) option and a) Time step (s) 0.0 (=auto) b) Pitch floor (Hz) 75.0 c) Max. number of candidates 15 d) Vey accurate=True e) Silence Threshold= 0.03 f) Voicing threshold= 0.45 g) Octave Cost= 0.01 h) Octave jump cost = 0.35 i) Voiced/ Unvoiced cost= 0.14 j) Pitch ceiling (Hz) = 350

    4) The statistical characteristics of the fundamental frequency were calculated only in the windows that were detected as voiced speech.

  4. s

    Northern Ireland Annual Descriptive House Price Statistics (Electoral Ward...

    • ckan.publishing.service.gov.uk
    Updated Feb 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Northern Ireland Annual Descriptive House Price Statistics (Electoral Ward Level) - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/northern-ireland-annual-descriptive-house-price-statistics-electoral-ward-level
    Explore at:
    Dataset updated
    Feb 29, 2020
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Ireland, Northern Ireland
    Description

    Annual descriptive price statistics for each calendar year 2005 – 2024 for 462 electoral wards within 11 Local Government Districts. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.

  5. Voice Gender recognition in Spanish language

    • kaggle.com
    zip
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enrique Díaz-Ocampo (2023). Voice Gender recognition in Spanish language [Dataset]. https://www.kaggle.com/datasets/enriquedazocampo/spanish-gender-recognition-mozilla
    Explore at:
    zip(16862585 bytes)Available download formats
    Dataset updated
    Jan 16, 2023
    Authors
    Enrique Díaz-Ocampo
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The following dataset is intended to be used for gender recognition using audio files in uncontrolled environments from the Mozilla Common Voice Dataset 10.0. It consists of a table of descriptive statistical characteristics of the fundamental frequency of Spanish language. In addition, the estimation of the vocal tract of each of the speakers.

    This dataset contains 18 columns: 'client_id': id speaker from Mozilla Common Voice 'path': Name of the mp3 file 'age': Age in decades (teens, twenties, etc.) 'gender': Binary gender (male or female) 'duration': Duration of mp3 in seconds 'vocal_tract_length': Vocal tract length in cm. 'mean_F4': Mean of the fourth formant in Hz. 'min_pitch': Minimal pitch of the whole pitch contour in Hz. 'mean_pitch': Mean pitch of the whole pitch contour in Hz. 'q1_pitch': : First quartile of the whole pitch contour in Hz. 'median_pitch': : Median pitch of the whole pitch contour Hz. 'q3_pitch': : Third quartile of the whole pitch contour in Hz. 'max_pitch': : Max pitch of the whole pitch contour in Hz. 'stddev_pitch' : Standard deviation of the whole pitch contour in Hz. 'estimated_age': Nominal value (adult or teen) 'estimated_age_gender: Nominal value (adult-male, adult-female, teen-male and teen-female). 'language': Nominal value (Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Thai, Vietnamese, and Punjabi).

    The methodology for the extraction of these characteristics was the following:

    Only the audios from the valid.tsv file of the respective language were analyzed (this file is contained in the Mozilla Common Voice Dataset https://commonvoice.mozilla.org/en/datasets ) the voiced-speech was extracted using Praat's algorithm Vocal ToolKit (https://www.praatvocaltoolkit.com/extract-voiced-and-unvoiced.html)

    2) The vocal tract length was calculated with the Vocal Tool Kit algorithm ( https://www.praatvocaltoolkit.com/calculate-vocal-tract-length.html ) as follows: If the audio came from a teen, then the maximum formant was established at 8000, otherwise it was adjusted to 5000 Hz for men and 5500 for women. Finally, the mean of the fourth formant was calculated for the windows with voiced speech only.

    3) The fundamental frequency was calculated using the PRAAT Software in the To Pitch (ac) option and a) Time step (s) 0.0 (=auto) b) Pitch floor (Hz) 75.0 c) Max. number of candidates 15 d) Vey accurate=True e) Silence Threshold= 0.03 f) Voicing threshold= 0.45 g) Octave Cost= 0.01 h) Octave jump cost = 0.35 i) Voiced/ Unvoiced cost= 0.14 j) Pitch ceiling (Hz) = 350

    4) The statistical characteristics of the fundamental frequency were calculated only in the windows that were detected as voiced speech.

  6. B

    2016 Census of Canada - Housing Suitability and Shelter-cost-to-income Ratio...

    • borealisdata.ca
    Updated Apr 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2021). 2016 Census of Canada - Housing Suitability and Shelter-cost-to-income Ratio by Status of Primary Household Maintainer for BC CSDs [custom tabulation] [Dataset]. http://doi.org/10.5683/SP2/6OEKPA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Borealis
    Authors
    Statistics Canada
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Canada, British Columbia
    Description

    This dataset includes one dataset which was custom ordered from Statistics Canada.The table includes information on housing suitability and shelter-cost-to-income ratio by number of bedrooms, housing tenure, status of primary household maintainer, household type, and income quartile ranges for census subdivisions in British Columbia. The dataset is in Beyond 20/20 (.ivt) format. The Beyond 20/20 browser is required in order to open it. This software can be freely downloaded from the Statistics Canada website: https://www.statcan.gc.ca/eng/public/beyond20-20 (Windows only). For information on how to use Beyond 20/20, please see: http://odesi2.scholarsportal.info/documentation/Beyond2020/beyond20-quickstart.pdf https://wiki.ubc.ca/Library:Beyond_20/20_Guide Custom order from Statistics Canada includes the following dimensions and variables: Geography: Non-reserve CSDs in British Columbia - 299 geographies The global non-response rate (GNR) is an important measure of census data quality. It combines total non-response (households) and partial non-response (questions). A lower GNR indicates a lower risk of non-response bias and, as a result, a lower risk of inaccuracy. The counts and estimates for geographic areas with a GNR equal to or greater than 50% are not published in the standard products. The counts and estimates for these areas have a high risk of non-response bias, and in most cases, should not be released. All the geographies requested for this tabulation have been cleared for the release of income data and have a GNR under 50%. Housing Tenure Including Presence of Mortgage (5) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero by housing tenure 2. Households who own 3. With a mortgage1 4. Without a mortgage 5. Households who rent Note: 1) Presence of mortgage - Refers to whether the owner households reported mortgage or loan payments for their dwelling. 2015 Before-tax Household Income Quartile Ranges (5) 1. Total – Private households by quartile ranges1, 2, 3 2. Count of households under or at quartile 1 3. Count of households between quartile 1 and quartile 2 (median) (including at quartile 2) 4. Count of households between quartile 2 (median) and quartile 3 (including at quartile 3) 5. Count of households over quartile 3 Notes: 1) A private household will be assigned to a quartile range depending on its CSD-level location and depending on its tenure (owned and rented). Quartile ranges for owned households in a specific CSD are delimited by the 2015 before-tax income quartiles of owned households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. Quartile ranges for rented households in a specific CSD are delimited by the 2015 before-tax income quartiles of rented households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. 2) For the income quartiles dollar values (the delimiters) please refer to Table 1. 3) Quartiles 1 to 3 are suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 16. For cases in which the renters’ quartiles or the owners’ quartiles (figures from Table 1) of a CSD are suppressed the CSD is assigned to a quartile range depending on the provincial renters’ or owners’ quartile figures. Number of Bedrooms (Unit Size) (6) 1. Total – Private households by number of bedrooms1 2. 0 bedrooms (Bachelor/Studio) 3. 1 bedroom 4. 2 bedrooms 5. 3 bedrooms 6. 4 bedrooms Note: 1) Dwellings with 5 bedrooms or more included in the total count only. Housing Suitability (6) 1. Total - Housing suitability 2. Suitable 3. Not suitable 4. One bedroom shortfall 5. Two bedroom shortfall 6. Three or more bedroom shortfall Note: 1) 'Housing suitability' refers to whether a private household is living in suitable accommodations according to the National Occupancy Standard (NOS); that is, whether the dwelling has enough bedrooms for the size and composition of the household. A household is deemed to be living in suitable accommodations if its dwelling has enough bedrooms, as calculated using the NOS. 'Housing suitability' assesses the required number of bedrooms for a household based on the age, sex, and relationships among household members. An alternative variable, 'persons per room,' considers all rooms in a private dwelling and the number of household members. Housing suitability and the National Occupancy Standard (NOS) on which it is based were developed by Canada Mortgage and Housing Corporation (CMHC) through consultations with provincial housing agencies. Shelter-cost-to-income-ratio (4) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero 2. Spending less than 30% of households total income on shelter costs 3. Spending 30% or more of households total income on shelter costs 4. Spending 50% or more of households total income on shelter costs Note: 'Shelter-cost-to-income...

  7. n

    Arctic Ocean and Climate Atlas (1950-1989)

    • access.earthdata.nasa.gov
    • cmr.earthdata.nasa.gov
    Updated Jun 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Arctic Ocean and Climate Atlas (1950-1989) [Dataset]. https://access.earthdata.nasa.gov/collections/C1214587227-SCIOPS
    Explore at:
    Dataset updated
    Jun 24, 2019
    Time period covered
    Jan 1, 1950 - Dec 31, 1989
    Area covered
    Arctic Ocean
    Description

    This CD-ROM atlas contains statistics of 45 years of summer (July through September) temperature and salinity data. All temperatures in this atlas are shown as potential temperatures. Salinities are reported in standard salinity units. Densities are reported as potential density. Minimum, maximum, mean, standard deviation, skewness, kurtosis, 1st quartile, median, and 3rd quartile of the temperature and salinity data were calculated for each decade in the data set and over each 200 km by 200 km grid cell within the central Arctic region. In the Nordic and Siberian Seas, the statistics were calculated over 50 km by 50 km cells, as well as 100 km by 100 km cells, due to the smaller spatial scales in this region. These statistics were compiled for depths (in meters below the ocean surface) of 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 400, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, and 4000. The ocean bottom bathymetry was defined using the National Geophysical Data Center ETOPO5 digital data set. The decadal periods are: 1950-1959, 1960-1969, 1970-1979, 1980-1989. The number of original data points used to calculate the statistics are recorded for each location and a planar fit to the data, at each depth, was also generated to indicate linear trends within each cell. The three coefficients of the planar fit and the root-mean-square error of the fit are also recorded in the atlas. It was agreed to publish these statistics because most of the Russian original data could not be released. The CD-ROM atlas also contains a complete set of statistics for the periods, June, October and November. Statistics for December through May are available in the atlas for the winter period.

  8. House price to residence-based earnings ratio

    • ons.gov.uk
    • cy.ons.gov.uk
    • +1more
    xlsx
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). House price to residence-based earnings ratio [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/ratioofhousepricetoresidencebasedearningslowerquartileandmedian
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.

  9. House price to workplace-based earnings ratio

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). House price to workplace-based earnings ratio [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/ratioofhousepricetoworkplacebasedearningslowerquartileandmedian
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.

  10. NY State Community Health Indicators

    • kaggle.com
    zip
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). NY State Community Health Indicators [Dataset]. https://www.kaggle.com/datasets/thedevastator/ny-state-community-health-indicators
    Explore at:
    zip(51836 bytes)Available download formats
    Dataset updated
    Jan 23, 2023
    Authors
    The Devastator
    Area covered
    New York
    Description

    NY State Community Health Indicators

    Obesity and Diabetes Related Indicators 2008–2012

    By Health Data New York [source]

    About this dataset

    This dataset contains New York State county-level data on obesity and diabetes related indicators from 2008 - 2012. It includes information about counties' population health status, such as the number of events, percentage/rate, 95% confidence interval, measured units and more. Analyzing this data provides insight into how communities across New York State are impacted by these diseases and how we can work together to create healthier living environments for everyone. This dataset is released under a Terms of Service license agreement – make sure to read through and understand the details if you plan to use it in any research or commercial application

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains county-level data on obesity and diabetes related indicators in New York State. As such, it can be used to research indicators related to general health in various counties of the state.

    To use this dataset effectively, first become familiar with the columns included and their meanings: - County Name: The name of the county. (String) - County Code: The code of the county. (Integer) - Region Name: The name of the region. (String) - Indicator Number: The number of the indicator. (Integer) - Total Event Counts: The total number of events related to the indicator.(Integer)
    - Denominator: The denominator used to calculate the percentage/rate.(Integer) - Denominator Note: Any additional notes related to the denominator.(String) - Measure Unit :The unit of measure used for this rate/percentage .(String). - Percentage/Rate :The percentage/rate calculated using denominator and observed count data .(Float). - 95% CI :The 95% confidence interval associated with any defined rate or percentage.(Float). - Data Comments :Any additional comments relevant to this data source or indicator .(String ). - Data Years :Years covered by this particular indicator observation .(String ). - Data Sources :Sources from which we have drawn our data for indicators involving counties from different regions .(Strings). - Quartile :Quartiles are derived when all geographic entities are ranked according to a specific metric score ,and are then cut into quartiles based on speed score =0= bottom quarter; =1= middle two quarters combined; =2= top quarter..(Integer). - Mapping Distribution ;A visual representation that includes mapping details regarding how Indicators relating either disease rates or characteristics are positioned across States, regions and counties as well as any trends plus other pertinent mapping information ,such as health resource availability.(In pair plot form form otherwise text will present an informational string.). Location ;Area where distribution around space occurs..e point feature with a single location ID retrieved from geoplanet proxy service.. (string ).

    Using these columns, you can find out demographic information about your chosen county such as obesity rate and diabetes incidence etc., enabling you better understand its health situation overall. Additionally,this dataset also provides important comparison features such as quartiles rankings

    Research Ideas

    • Analysing the geographic distribution of obesity and diabetes related indicators by county in New York State, in order to identify areas which may require greater levels of intervention and preventative health measures.

    • Evaluating trends over time for different counties to assess whether policies or programs have had an impact on indicators relating to obesity and diabetes within the given area.

    • Using machine learning techniques such as clustering analysis or predictive modelling, to identify patterns within the data which can be used to better inform preventative health interventions across New York State

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: community-health-obesity-and-diabetes-related-indicators-2008-2012-1.csv | Column name | Description | |:-------------------------|:-----------------------------------------------------------------------------------------| | **Count...

  11. Earnings by Workplace, Borough - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). Earnings by Workplace, Borough - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/earnings-by-workplace-borough
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    This dataset provides information about earnings of employees who are working in an area, who are on adult rates and whose pay for the survey pay-period was not affected by absence. Tables provided here include total gross weekly earnings, and full time weekly earnings with breakdowns by gender, and annual median, mean and lower quartile earnings by borough and UK region. These are provided both in nominal and real terms. Real earnings figures are on sheets labelled "real", are in 2016 prices, and calculated by applying ONS’s annual CPI index series for April to ASHE data. Annual Survey of Hours and Earnings (ASHE) is based on a sample of employee jobs taken from HM Revenue & Customs PAYE records. Information on earnings and hours is obtained in confidence from employers. ASHE does not cover the self-employed nor does it cover employees not paid during the reference period. The earnings information presented relates to gross pay before tax, National Insurance or other deductions, and excludes payments in kind. The confidence figure is the coefficient of variation (CV) of that estimate. The CV is the ratio of the standard error of an estimate to the estimate itself and is expressed as a percentage. The smaller the coefficient of variation the greater the accuracy of the estimate. The true value is likely to lie within +/- twice the CV. Results for 2003 and earlier exclude supplementary surveys. In 2006 there were a number of methodological changes made. For further details goto : http://www.nomisweb.co.uk/articles/341.aspx. The headline statistics for ASHE are based on the median rather than the mean. The median is the value below which 50 per cent of employees fall. It is ONS's preferred measure of average earnings as it is less affected by a relatively small number of very high earners and the skewed distribution of earnings. It therefore gives a better indication of typical pay than the mean. Survey data from a sample frame, use caution if using for performance measurement and trend analysis '#' These figures are suppressed as statistically unreliable. ! Estimate and confidence interval not available since the group sample size is zero or disclosive (0-2). Furthermore, data from Abstract of Regional Statistics, New Earnings Survey and ASHE have been combined to create long run historical series of full-time weekly earnings data for London and Great Britain, stretching back to 1965, and is broken down by sex.

  12. f

    Summary of predictive performance per dataset when using clinical...

    • figshare.com
    • plos.figshare.com
    xlsx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen R. Piccolo; Avery Mecham; Nathan P. Golightly; Jérémie L. Johnson; Dustin B. Miller (2023). Summary of predictive performance per dataset when using clinical predictors. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009926.s035
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Stephen R. Piccolo; Avery Mecham; Nathan P. Golightly; Jérémie L. Johnson; Dustin B. Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We predicted patient states using clinical predictors only (Analysis 2). For each combination of dataset, class variable, and classification algorithm, we calculated the arithmetic mean of area under the receiver operating characteristic curve (AUROC) values across 50 iterations of Monte Carlo cross-validation. Next, we calculated the minimum, first quartile (Q1), median, third quartile (Q3), and maximum for these values across the algorithms. Finally, we sorted the algorithms in descending order based on median values. Each row represents a particular dataset/class combination. For some dataset/class combinations, no clinical predictors were available; these combinations are excluded from this file. (XLSX)

  13. f

    Raw data of the width and depth estimates.

    • figshare.com
    • plos.figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph von Castell; Heiko Hecht; Daniel Oberfeld (2023). Raw data of the width and depth estimates. [Dataset]. http://doi.org/10.1371/journal.pone.0201976.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Christoph von Castell; Heiko Hecht; Daniel Oberfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Archive file containing the raw datasets (i.e., uncorrected and not aggregated across repetitions per condition) for the width estimates (widthEstimates.csv) and depth estimates (depthEstimates.csv). When the variable 'typoExperimenter' has the value 'true', this indicates that the experimenter noted an uncorrected typo for the entered estimate on the given trial. These trials were excluded before performing the outlier analysis. When the variable 'outlier' has the value 'true', this indicates that the estimate on the given trial was more than 1.5 times the interquartile range below the first quartile or above the third quartile, relative to the set of ten trials collected for the given combination of subject and experimental condition (judged dimension × surface-luminance configuration × spatial extent). These trials were excluded from our data analyses. (ZIP)

  14. Differences between lower and upper quartiles of scales.

    • plos.figshare.com
    xls
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David William Evans (2024). Differences between lower and upper quartiles of scales. [Dataset]. http://doi.org/10.1371/journal.pone.0303102.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 8, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David William Evans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Differences between lower and upper quartiles of scales.

  15. Descriptive statistics of the 2 datasets with mean, standard deviation (SD),...

    • plos.figshare.com
    xls
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann (2023). Descriptive statistics of the 2 datasets with mean, standard deviation (SD), median, the lower (quantile 2.5%) and upper (quantile 97.5%) boundary of the 95% confidence interval, and the interquartile range IQR (quartile 75%—quartile 25%). [Dataset]. http://doi.org/10.1371/journal.pone.0282213.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Achim Langenbucher; Nóra Szentmáry; Alan Cayless; Jascha Wendelstein; Peter Hoffmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.

  16. Median DIP-indices with first (Q1) and third (Q3) quartile, split for...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan A. Lipman; Arthur E. Attema (2023). Median DIP-indices with first (Q1) and third (Q3) quartile, split for positive and negative discounting. [Dataset]. http://doi.org/10.1371/journal.pone.0229784.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Stefan A. Lipman; Arthur E. Attema
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Median DIP-indices with first (Q1) and third (Q3) quartile, split for positive and negative discounting.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rohit Zaman (2021). Gender, Age, and Emotion Detection from Voice [Dataset]. https://www.kaggle.com/rohitzaman/gender-age-and-emotion-detection-from-voice
Organization logo

Gender, Age, and Emotion Detection from Voice

Extracted statistical features from audios and added labels to form the datasets

Explore at:
36 scholarly articles cite this dataset (View in Google Scholar)
zip(967820 bytes)Available download formats
Dataset updated
May 29, 2021
Authors
Rohit Zaman
Description

Context

Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.

Content

Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.

Acknowledgements

Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/

Search
Clear search
Close search
Google apps
Main menu