Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Tax statistics are compiled on the basis of personal tax returns at the place of residence. The income year is the year for which taxes are due.Total taxable net income consists of all net professional income, net real estate income, net movable income and miscellaneous net income.
To measure the dispersal of income distribution, tax returns are classified in ascending order of income and divided into 4 equal parts separated by 3 quartiles (Q1:25 % of the returns have income less than Q1, Q2 = median income: 50 % of returns have income less than Q2, Q3= 75 % of returns have income less than Q3). Tax returns with zero taxable income are not included in the calculations. The indicator reports the difference between the 3 rd and 1st quartile to the median: (Q3-Q1)/Q2.The higher the interquartile coefficient, the higher the degree of income inequality. As it refers to the median value, it makes it possible to compare the dispersion of series with very different median values. The income year is the year for which taxes are due. Total taxable net income consists of all net professional income, net real estate income, net movable income and miscellaneous net income.
To measure the dispersal of income distribution, tax returns are classified in ascending order of income and divided into 4 equal parts separated by 3 quartiles (Q1: 25 % of the returns have income less than Q1, Q2 = median income: 50 % of returns have income less than Q2, Q3= 75 % of returns have income less than Q3). Tax returns with zero taxable income are not included in the calculations.
The indicator reports the difference between the 3 rd and 1st quartile to the median: (Q3-Q1)/Q2. The higher the interquartile coefficient, the higher the degree of income inequality. As it refers to the median value, it makes it possible to compare the dispersion of series with very different median values.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The following dataset is intended to be used for gender recognition using audio files in uncontrolled environments from the Mozilla Common Voice Dataset 10.0. It consists of a table of descriptive statistical characteristics of the fundamental frequency of six tonal languages Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Thai, Vietnamese, and Punjabi. In addition, the estimation of the vocal tract of each of the speakers.
This dataset contains 18 columns: 'client_id': id speaker from Mozilla Common Voice 'path': Name of the mp3 file 'sentence': The sentence spoken by the speaker 'age': Age in decades (teens, twenties, etc.) 'gender': Binary gender (male or female) 'duration': Duration of mp3 in seconds 'vocal_tract_length': Vocal tract length in cm. 'mean_F4': Mean of the fourth formant in Hz. 'min_pitch': Minimal pitch of the whole pitch contour in Hz. 'mean_pitch': Mean pitch of the whole pitch contour in Hz. 'q1_pitch': : First quartile of the whole pitch contour in Hz. 'median_pitch': : Median pitch of the whole pitch contour Hz. 'q3_pitch': : Third quartile of the whole pitch contour in Hz. 'max_pitch': : Max pitch of the whole pitch contour in Hz. 'stddev_pitch' : Standard deviation of the whole pitch contour in Hz. 'estimated_age': Nominal value (adult or teen) 'estimated_age_gender: Nominal value (adult-male, adult-female, teen-male and teen-female). 'language': Nominal value (Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Thai, Vietnamese, and Punjabi).
The methodology for the extraction of these characteristics was the following:
Only the audios from the valid.tsv file of the respective language were analyzed (this file is contained in the Mozilla Common Voice Dataset https://commonvoice.mozilla.org/en/datasets ) the voiced-speech was extracted using Praat's algorithm Vocal ToolKit (https://www.praatvocaltoolkit.com/extract-voiced-and-unvoiced.html)
2) The vocal tract length was calculated with the Vocal Tool Kit algorithm ( https://www.praatvocaltoolkit.com/calculate-vocal-tract-length.html ) as follows: If the audio came from a teen, then the maximum formant was established at 8000, otherwise it was adjusted to 5000 Hz for men and 5500 for women. Finally, the mean of the fourth formant was calculated for the windows with voiced speech only.
3) The fundamental frequency was calculated using the PRAAT Software in the To Pitch (ac) option and a) Time step (s) 0.0 (=auto) b) Pitch floor (Hz) 75.0 c) Max. number of candidates 15 d) Vey accurate=True e) Silence Threshold= 0.03 f) Voicing threshold= 0.45 g) Octave Cost= 0.01 h) Octave jump cost = 0.35 i) Voiced/ Unvoiced cost= 0.14 j) Pitch ceiling (Hz) = 350
4) The statistical characteristics of the fundamental frequency were calculated only in the windows that were detected as voiced speech.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2024 for 462 electoral wards within 11 Local Government Districts. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The following dataset is intended to be used for gender recognition using audio files in uncontrolled environments from the Mozilla Common Voice Dataset 10.0. It consists of a table of descriptive statistical characteristics of the fundamental frequency of Spanish language. In addition, the estimation of the vocal tract of each of the speakers.
This dataset contains 18 columns: 'client_id': id speaker from Mozilla Common Voice 'path': Name of the mp3 file 'age': Age in decades (teens, twenties, etc.) 'gender': Binary gender (male or female) 'duration': Duration of mp3 in seconds 'vocal_tract_length': Vocal tract length in cm. 'mean_F4': Mean of the fourth formant in Hz. 'min_pitch': Minimal pitch of the whole pitch contour in Hz. 'mean_pitch': Mean pitch of the whole pitch contour in Hz. 'q1_pitch': : First quartile of the whole pitch contour in Hz. 'median_pitch': : Median pitch of the whole pitch contour Hz. 'q3_pitch': : Third quartile of the whole pitch contour in Hz. 'max_pitch': : Max pitch of the whole pitch contour in Hz. 'stddev_pitch' : Standard deviation of the whole pitch contour in Hz. 'estimated_age': Nominal value (adult or teen) 'estimated_age_gender: Nominal value (adult-male, adult-female, teen-male and teen-female). 'language': Nominal value (Chinese (China), Chinese (Hong Kong), Chinese (Taiwan), Thai, Vietnamese, and Punjabi).
The methodology for the extraction of these characteristics was the following:
Only the audios from the valid.tsv file of the respective language were analyzed (this file is contained in the Mozilla Common Voice Dataset https://commonvoice.mozilla.org/en/datasets ) the voiced-speech was extracted using Praat's algorithm Vocal ToolKit (https://www.praatvocaltoolkit.com/extract-voiced-and-unvoiced.html)
2) The vocal tract length was calculated with the Vocal Tool Kit algorithm ( https://www.praatvocaltoolkit.com/calculate-vocal-tract-length.html ) as follows: If the audio came from a teen, then the maximum formant was established at 8000, otherwise it was adjusted to 5000 Hz for men and 5500 for women. Finally, the mean of the fourth formant was calculated for the windows with voiced speech only.
3) The fundamental frequency was calculated using the PRAAT Software in the To Pitch (ac) option and a) Time step (s) 0.0 (=auto) b) Pitch floor (Hz) 75.0 c) Max. number of candidates 15 d) Vey accurate=True e) Silence Threshold= 0.03 f) Voicing threshold= 0.45 g) Octave Cost= 0.01 h) Octave jump cost = 0.35 i) Voiced/ Unvoiced cost= 0.14 j) Pitch ceiling (Hz) = 350
4) The statistical characteristics of the fundamental frequency were calculated only in the windows that were detected as voiced speech.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset includes one dataset which was custom ordered from Statistics Canada.The table includes information on housing suitability and shelter-cost-to-income ratio by number of bedrooms, housing tenure, status of primary household maintainer, household type, and income quartile ranges for census subdivisions in British Columbia. The dataset is in Beyond 20/20 (.ivt) format. The Beyond 20/20 browser is required in order to open it. This software can be freely downloaded from the Statistics Canada website: https://www.statcan.gc.ca/eng/public/beyond20-20 (Windows only). For information on how to use Beyond 20/20, please see: http://odesi2.scholarsportal.info/documentation/Beyond2020/beyond20-quickstart.pdf https://wiki.ubc.ca/Library:Beyond_20/20_Guide Custom order from Statistics Canada includes the following dimensions and variables: Geography: Non-reserve CSDs in British Columbia - 299 geographies The global non-response rate (GNR) is an important measure of census data quality. It combines total non-response (households) and partial non-response (questions). A lower GNR indicates a lower risk of non-response bias and, as a result, a lower risk of inaccuracy. The counts and estimates for geographic areas with a GNR equal to or greater than 50% are not published in the standard products. The counts and estimates for these areas have a high risk of non-response bias, and in most cases, should not be released. All the geographies requested for this tabulation have been cleared for the release of income data and have a GNR under 50%. Housing Tenure Including Presence of Mortgage (5) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero by housing tenure 2. Households who own 3. With a mortgage1 4. Without a mortgage 5. Households who rent Note: 1) Presence of mortgage - Refers to whether the owner households reported mortgage or loan payments for their dwelling. 2015 Before-tax Household Income Quartile Ranges (5) 1. Total – Private households by quartile ranges1, 2, 3 2. Count of households under or at quartile 1 3. Count of households between quartile 1 and quartile 2 (median) (including at quartile 2) 4. Count of households between quartile 2 (median) and quartile 3 (including at quartile 3) 5. Count of households over quartile 3 Notes: 1) A private household will be assigned to a quartile range depending on its CSD-level location and depending on its tenure (owned and rented). Quartile ranges for owned households in a specific CSD are delimited by the 2015 before-tax income quartiles of owned households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. Quartile ranges for rented households in a specific CSD are delimited by the 2015 before-tax income quartiles of rented households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. 2) For the income quartiles dollar values (the delimiters) please refer to Table 1. 3) Quartiles 1 to 3 are suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 16. For cases in which the renters’ quartiles or the owners’ quartiles (figures from Table 1) of a CSD are suppressed the CSD is assigned to a quartile range depending on the provincial renters’ or owners’ quartile figures. Number of Bedrooms (Unit Size) (6) 1. Total – Private households by number of bedrooms1 2. 0 bedrooms (Bachelor/Studio) 3. 1 bedroom 4. 2 bedrooms 5. 3 bedrooms 6. 4 bedrooms Note: 1) Dwellings with 5 bedrooms or more included in the total count only. Housing Suitability (6) 1. Total - Housing suitability 2. Suitable 3. Not suitable 4. One bedroom shortfall 5. Two bedroom shortfall 6. Three or more bedroom shortfall Note: 1) 'Housing suitability' refers to whether a private household is living in suitable accommodations according to the National Occupancy Standard (NOS); that is, whether the dwelling has enough bedrooms for the size and composition of the household. A household is deemed to be living in suitable accommodations if its dwelling has enough bedrooms, as calculated using the NOS. 'Housing suitability' assesses the required number of bedrooms for a household based on the age, sex, and relationships among household members. An alternative variable, 'persons per room,' considers all rooms in a private dwelling and the number of household members. Housing suitability and the National Occupancy Standard (NOS) on which it is based were developed by Canada Mortgage and Housing Corporation (CMHC) through consultations with provincial housing agencies. Shelter-cost-to-income-ratio (4) 1. Total – Private non-band non-farm off-reserve households with an income greater than zero 2. Spending less than 30% of households total income on shelter costs 3. Spending 30% or more of households total income on shelter costs 4. Spending 50% or more of households total income on shelter costs Note: 'Shelter-cost-to-income...
Facebook
TwitterThis CD-ROM atlas contains statistics of 45 years of summer (July through September) temperature and salinity data. All temperatures in this atlas are shown as potential temperatures. Salinities are reported in standard salinity units. Densities are reported as potential density. Minimum, maximum, mean, standard deviation, skewness, kurtosis, 1st quartile, median, and 3rd quartile of the temperature and salinity data were calculated for each decade in the data set and over each 200 km by 200 km grid cell within the central Arctic region. In the Nordic and Siberian Seas, the statistics were calculated over 50 km by 50 km cells, as well as 100 km by 100 km cells, due to the smaller spatial scales in this region. These statistics were compiled for depths (in meters below the ocean surface) of 5, 10, 25, 50, 75, 100, 150, 200, 250, 300, 400, 500, 750, 1000, 1500, 2000, 2500, 3000, 3500, and 4000. The ocean bottom bathymetry was defined using the National Geophysical Data Center ETOPO5 digital data set. The decadal periods are: 1950-1959, 1960-1969, 1970-1979, 1980-1989. The number of original data points used to calculate the statistics are recorded for each location and a planar fit to the data, at each depth, was also generated to indicate linear trends within each cell. The three coefficients of the planar fit and the root-mean-square error of the fit are also recorded in the atlas. It was agreed to publish these statistics because most of the Russian original data could not be released. The CD-ROM atlas also contains a complete set of statistics for the periods, June, October and November. Statistics for December through May are available in the atlas for the winter period.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Facebook
TwitterBy Health Data New York [source]
This dataset contains New York State county-level data on obesity and diabetes related indicators from 2008 - 2012. It includes information about counties' population health status, such as the number of events, percentage/rate, 95% confidence interval, measured units and more. Analyzing this data provides insight into how communities across New York State are impacted by these diseases and how we can work together to create healthier living environments for everyone. This dataset is released under a Terms of Service license agreement – make sure to read through and understand the details if you plan to use it in any research or commercial application
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains county-level data on obesity and diabetes related indicators in New York State. As such, it can be used to research indicators related to general health in various counties of the state.
To use this dataset effectively, first become familiar with the columns included and their meanings: - County Name: The name of the county. (String) - County Code: The code of the county. (Integer) - Region Name: The name of the region. (String) - Indicator Number: The number of the indicator. (Integer) - Total Event Counts: The total number of events related to the indicator.(Integer)
- Denominator: The denominator used to calculate the percentage/rate.(Integer) - Denominator Note: Any additional notes related to the denominator.(String) - Measure Unit :The unit of measure used for this rate/percentage .(String). - Percentage/Rate :The percentage/rate calculated using denominator and observed count data .(Float). - 95% CI :The 95% confidence interval associated with any defined rate or percentage.(Float). - Data Comments :Any additional comments relevant to this data source or indicator .(String ). - Data Years :Years covered by this particular indicator observation .(String ). - Data Sources :Sources from which we have drawn our data for indicators involving counties from different regions .(Strings). - Quartile :Quartiles are derived when all geographic entities are ranked according to a specific metric score ,and are then cut into quartiles based on speed score =0= bottom quarter; =1= middle two quarters combined; =2= top quarter..(Integer). - Mapping Distribution ;A visual representation that includes mapping details regarding how Indicators relating either disease rates or characteristics are positioned across States, regions and counties as well as any trends plus other pertinent mapping information ,such as health resource availability.(In pair plot form form otherwise text will present an informational string.). Location ;Area where distribution around space occurs..e point feature with a single location ID retrieved from geoplanet proxy service.. (string ).Using these columns, you can find out demographic information about your chosen county such as obesity rate and diabetes incidence etc., enabling you better understand its health situation overall. Additionally,this dataset also provides important comparison features such as quartiles rankings
Analysing the geographic distribution of obesity and diabetes related indicators by county in New York State, in order to identify areas which may require greater levels of intervention and preventative health measures.
Evaluating trends over time for different counties to assess whether policies or programs have had an impact on indicators relating to obesity and diabetes within the given area.
Using machine learning techniques such as clustering analysis or predictive modelling, to identify patterns within the data which can be used to better inform preventative health interventions across New York State
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: community-health-obesity-and-diabetes-related-indicators-2008-2012-1.csv | Column name | Description | |:-------------------------|:-----------------------------------------------------------------------------------------| | **Count...
Facebook
TwitterThis dataset provides information about earnings of employees who are working in an area, who are on adult rates and whose pay for the survey pay-period was not affected by absence. Tables provided here include total gross weekly earnings, and full time weekly earnings with breakdowns by gender, and annual median, mean and lower quartile earnings by borough and UK region. These are provided both in nominal and real terms. Real earnings figures are on sheets labelled "real", are in 2016 prices, and calculated by applying ONS’s annual CPI index series for April to ASHE data. Annual Survey of Hours and Earnings (ASHE) is based on a sample of employee jobs taken from HM Revenue & Customs PAYE records. Information on earnings and hours is obtained in confidence from employers. ASHE does not cover the self-employed nor does it cover employees not paid during the reference period. The earnings information presented relates to gross pay before tax, National Insurance or other deductions, and excludes payments in kind. The confidence figure is the coefficient of variation (CV) of that estimate. The CV is the ratio of the standard error of an estimate to the estimate itself and is expressed as a percentage. The smaller the coefficient of variation the greater the accuracy of the estimate. The true value is likely to lie within +/- twice the CV. Results for 2003 and earlier exclude supplementary surveys. In 2006 there were a number of methodological changes made. For further details goto : http://www.nomisweb.co.uk/articles/341.aspx. The headline statistics for ASHE are based on the median rather than the mean. The median is the value below which 50 per cent of employees fall. It is ONS's preferred measure of average earnings as it is less affected by a relatively small number of very high earners and the skewed distribution of earnings. It therefore gives a better indication of typical pay than the mean. Survey data from a sample frame, use caution if using for performance measurement and trend analysis '#' These figures are suppressed as statistically unreliable. ! Estimate and confidence interval not available since the group sample size is zero or disclosive (0-2). Furthermore, data from Abstract of Regional Statistics, New Earnings Survey and ASHE have been combined to create long run historical series of full-time weekly earnings data for London and Great Britain, stretching back to 1965, and is broken down by sex.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We predicted patient states using clinical predictors only (Analysis 2). For each combination of dataset, class variable, and classification algorithm, we calculated the arithmetic mean of area under the receiver operating characteristic curve (AUROC) values across 50 iterations of Monte Carlo cross-validation. Next, we calculated the minimum, first quartile (Q1), median, third quartile (Q3), and maximum for these values across the algorithms. Finally, we sorted the algorithms in descending order based on median values. Each row represents a particular dataset/class combination. For some dataset/class combinations, no clinical predictors were available; these combinations are excluded from this file. (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Archive file containing the raw datasets (i.e., uncorrected and not aggregated across repetitions per condition) for the width estimates (widthEstimates.csv) and depth estimates (depthEstimates.csv). When the variable 'typoExperimenter' has the value 'true', this indicates that the experimenter noted an uncorrected typo for the entered estimate on the given trial. These trials were excluded before performing the outlier analysis. When the variable 'outlier' has the value 'true', this indicates that the estimate on the given trial was more than 1.5 times the interquartile range below the first quartile or above the third quartile, relative to the set of ten trials collected for the given combination of subject and experimental condition (judged dimension × surface-luminance configuration × spatial extent). These trials were excluded from our data analyses. (ZIP)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Differences between lower and upper quartiles of scales.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Median DIP-indices with first (Q1) and third (Q3) quartile, split for positive and negative discounting.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/