https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
An academic journal or research journal is a periodical publication in which research articles relating to a particular academic discipline is published, according to Wikipedia. Currently, there are more than 25,000 peer-reviewed journals that are indexed in citation index databases such as Scopus and Web of Science. These indexes are ranked on the basis of various metrics such as CiteScore, H-index, etc. The metrics are calculated from yearly citation data of the journal. A lot of efforts are given to make a metric that reflects the journal's quality.
This is a comprehensive dataset on the academic journals coving their metadata information as well as citation, metrics, and ranking information. Detailed data on their subject area is also given in this dataset. The dataset is collected from the following indexing databases: - Scimago Journal Ranking - Scopus - Web of Science Master Journal List
The data is collected by scraping and then it was cleaned, details of which can be found in HERE.
Rest of the features provide further details on the journal's subject area or category: - Life Sciences: Top level subject area. - Social Sciences: Top level subject area. - Physical Sciences: Top level subject area. - Health Sciences: Top level subject area. - 1000 General: ASJC main category. - 1100 Agricultural and Biological Sciences: ASJC main category. - 1200 Arts and Humanities: ASJC main category. - 1300 Biochemistry, Genetics and Molecular Biology: ASJC main category. - 1400 Business, Management and Accounting: ASJC main category. - 1500 Chemical Engineering: ASJC main category. - 1600 Chemistry: ASJC main category. - 1700 Computer Science: ASJC main category. - 1800 Decision Sciences: ASJC main category. - 1900 Earth and Planetary Sciences: ASJC main category. - 2000 Economics, Econometrics and Finance: ASJC main category. - 2100 Energy: ASJC main category. - 2200 Engineering: ASJC main category. - 2300 Environmental Science: ASJC main category. - 2400 Immunology and Microbiology: ASJC main category. - 2500 Materials Science: ASJC main category. - 2600 Mathematics: ASJC main category. - 2700 Medicine: ASJC main category. - 2800 Neuroscience: ASJC main category. - 2900 Nursing: ASJC main category. - 3000 Pharmacology, Toxicology and Pharmaceutics: ASJC main category. - 3100 Physics and Astronomy: ASJC main category. - 3200 Psychology: ASJC main category. - 3300 Social Sciences: ASJC main category. - 3400 Veterinary: ASJC main category. - 3500 Dentistry: ASJC main category. - 3600 Health Professions: ASJC main category.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents information from 2016 at the household level; the percentage of households within each Index of Household Advantage and Disadvantage (IHAD) quartile for Local Government Area (LGA) 2017 boundaries.
The IHAD is an experimental analytical index developed by the Australian Bureau of Statistics (ABS) that provides a summary measure of relative socio-economic advantage and disadvantage for households. It utilises information from the 2016 Census of Population and Housing.
IHAD quartiles: All households are ordered from lowest to highest disadvantage, the lowest 25% of households are given a quartile number of 1, the next lowest 25% of households are given a quartile number of 2 and so on, up to the highest 25% of households which are given a quartile number of 4. This means that households are divided up into four groups, depending on their score.
This data is ABS data (catalogue number: 4198.0) used with permission from the Australian Bureau of Statistics.
For more information please visit the Australian Bureau of Statistics.
Please note:
AURIN has generated this dataset through aggregating the original SA1 level data (with calculated number of households/quartile) to LGA level.
Aggregation was achieved through calculating the centroid for each SA1 and assigning it to the LGA it fell within.
The number of occupied private dwellings, and number of households in each of the IHAD quartiles were calculated for each LGA by aggregating the peviously assigned SA1 values of each of those specified columns from the SA1 dataset. Percentages of households in each of the IHAD quartiles were calculated for each LGA from these aggregated totals.
A household is defined as one or more persons, at least one of whom is at least 15 years of age, usually resident in the same private dwelling. All occupants of a dwelling form a household. For Census purposes, the total number of households is equal to the total number of occupied private dwellings (Census of Population and Housing: Census Dictionary, 2016 cat. no. 2901.0).
IHAD output has been confidentialised to meet ABS requirements. In line with standard ABS procedures to minimise the risk of identifying individuals, a technique has been applied to randomly adjust cell values of the output tables. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals.
Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents information from 2016 at the household level; the percentage of households within each Index of Household Advantage and Disadvantage (IHAD) quartile for Statistical Area Level 2 (SA2) 2016 boundaries.
The IHAD is an experimental analytical index developed by the Australian Bureau of Statistics (ABS) that provides a summary measure of relative socio-economic advantage and disadvantage for households. It utilises information from the 2016 Census of Population and Housing.
IHAD quartiles: All households are ordered from lowest to highest disadvantage, the lowest 25% of households are given a quartile number of 1, the next lowest 25% of households are given a quartile number of 2 and so on, up to the highest 25% of households which are given a quartile number of 4. This means that households are divided up into four groups, depending on their score.
This data is ABS data (catalogue number: 4198.0) used with permission from the Australian Bureau of Statistics.
For more information please visit the Australian Bureau of Statistics.
Please note:
AURIN has generated this dataset through aggregating the original SA1 level data (with calculated number of households/quartile) to SA2 level.
The number of occupied private dwellings, and number of households in each of the IHAD quartiles for each SA2 were calculated by aggregating the values of each of those specified columns from the SA1 dataset. Percentages of households in each of the IHAD quartiles were calculated for each SA2 from these aggregated totals.
A household is defined as one or more persons, at least one of whom is at least 15 years of age, usually resident in the same private dwelling. All occupants of a dwelling form a household. For Census purposes, the total number of households is equal to the total number of occupied private dwellings (Census of Population and Housing: Census Dictionary, 2016 cat. no. 2901.0).
IHAD output has been confidentialised to meet ABS requirements. In line with standard ABS procedures to minimise the risk of identifying individuals, a technique has been applied to randomly adjust cell values of the output tables. These adjustments may cause the sum of rows or columns to differ by small amounts from table totals.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The project aims to investigate the relationship between temperature, humidity, TDS value, pH level, and growth days to understand how these factors influence lettuce growth. The project also aims to calculate summary statistics such as mean, median, quartiles, and min/max values for each variable to gain insights into the distribution of the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset includes one dataset which was custom ordered from Statistics Canada.The table includes information on housing suitability and shelter-cost-to-income ratio by number of bedrooms, housing tenure, status of primary household maintainer, household type, and income quartile ranges for census subdivisions in British Columbia.
The dataset is in Beyond 20/20 (.ivt) format. The Beyond 20/20 browser is required in order to open it. This software can be freely downloaded from the Statistics Canada website:
https://www.statcan.gc.ca/eng/public/beyond20-20 (Windows only).
For information on how to use Beyond 20/20, please see:
http://odesi2.scholarsportal.info/documentation/Beyond2020/beyond20-quickstart.pdf
https://wiki.ubc.ca/Library:Beyond_20/20_Guide
Custom order from Statistics Canada includes the following dimensions and variables:
Geography:
Non-reserve CSDs in British Columbia - 299 geographies
The global non-response rate (GNR) is an important measure of census data quality. It combines total non-response (households) and partial non-response (questions). A lower GNR indicates a lower risk of non-response bias and, as a result, a lower risk of inaccuracy. The counts and estimates for geographic areas with a GNR equal to or greater than 50% are not published in the standard products. The counts and estimates for these areas have a high risk of non-response bias, and in most cases, should not be released. All the geographies requested for this tabulation have been cleared for the release of income data and have a GNR under 50%.
Housing Tenure Including Presence of Mortgage (5)
1. Total – Private non-band non-farm off-reserve households with an income greater than zero by housing tenure
2. Households who own
3. With a mortgage1
4. Without a mortgage
5. Households who rent
Note: 1) Presence of mortgage - Refers to whether the owner households reported mortgage or loan payments for their dwelling.
2015 Before-tax Household Income Quartile Ranges (5)
1. Total – Private households by quartile ranges1, 2, 3
2. Count of households under or at quartile 1
3. Count of households between quartile 1 and quartile 2 (median) (including at quartile 2)
4. Count of households between quartile 2 (median) and quartile 3 (including at quartile 3)
5. Count of households over quartile 3
Notes: 1) A private household will be assigned to a quartile range depending on its CSD-level location and depending on its tenure (owned and rented). Quartile ranges for owned households in a specific CSD are delimited by the 2015 before-tax income quartiles of owned households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD. Quartile ranges for rented households in a specific CSD are delimited by the 2015 before-tax income quartiles of rented households with an income greater than zero and residing in non-farm off-reserve dwellings in that CSD.
2) For the income quartiles dollar values (the delimiters) please refer to Table 1.
3) Quartiles 1 to 3 are suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 16. For cases in which the renters’ quartiles or the owners’ quartiles (figures from Table 1) of a CSD are suppressed the CSD is assigned to a quartile range depending on the provincial renters’ or owners’ quartile figures.
Number of Bedrooms (Unit Size) (6)
1. Total – Private households by number of bedrooms1
2. 0 bedrooms (Bachelor/Studio)
3. 1 bedroom
4. 2 bedrooms
5. 3 bedrooms
6. 4 bedrooms
Note: 1) Dwellings with 5 bedrooms or more included in the total count only.
Housing Suitability (6)
1. Total - Housing suitability
2. Suitable
3. Not suitable
4. One bedroom shortfall
5. Two bedroom shortfall
6. Three or more bedroom shortfall
Note: 1) 'Housing suitability' refers to whether a private household is living in suitable accommodations according to the National Occupancy Standard (NOS); that is, whether the dwelling has enough bedrooms for the size and composition of the household. A household is deemed to be living in suitable accommodations if its dwelling has enough bedrooms, as calculated using the NOS.
'Housing suitability' assesses the required number of bedrooms for a household based on the age, sex, and relationships among household members. An alternative variable, 'persons per room,' considers all rooms in a private dwelling and the number of household members.
Housing suitability and the National Occupancy Standard (NOS) on which it is based were developed by Canada Mortgage and Housing Corporation (CMHC) through consultations with provincial housing agencies.
Shelter-cost-to-income-ratio (4)
1. Total – Private non-band non-farm off-reserve households with an income greater than zero
2. Spending less than 30% of households total income on shelter costs
3. Spending 30% or more of households total income on shelter costs
4. Spending 50% or more of households total income on shelter costs
Note: 'Shelter-cost-to-income ratio' refers to the proportion of average total income of household which is spent on shelter costs.
Household Statistics (8)
1. Total – Private non-band non-farm off-reserve households with an income greater than zero1
2. Average household income in 2015 ($)2
3. Median household income in 2015 ($)3
4. Quartile 1 of household income in 2015 ($)4
5. Quartile 2 (median) of household income in 2015 ($)4
6. Quartile 3 of household income in 2015 ($)4
7. Average monthly shelter costs ($)2,5
8. Median monthly shelter costs ($)3,5
Notes: 1) All households statistics are calculated based on the distribution of private households in non-farm off-reserve non-band occupied private dwellings with a before-tax household income greater than zero.
2) The average is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 4.
3) The median is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 8.
4) Quartiles 1 to 3 are suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 16.
5) Shelter costs for owner households include, where applicable, mortgage payments, property taxes and condominium fees, along with the costs of electricity, heat, water and other municipal services. For renter households, shelter costs include, where applicable, the rent and the costs of electricity, heat, water and other municipal services.
Status of Primary Household Maintainer (11)
1. Total – Private households by Aboriginal identity of the primary household maintainer
2. PHM is Aboriginal2
3. PHM is not Aboriginal
4. Total – Private households by immigration status of the primary household maintainer
5. PHM is a non-immigrant3
6. PHM is an immigrant or a non-permanent resident
7. PHM is a non-permanent resident4
8. PHM is an immigrant5,6
9. Officially landed in Canada between 2011 and 2016 7
10. Officially landed in Canada between 2006 and 2010
11. Officially landed in Canada before 2006
Notes: 1) The Primary Household Maintainer is the first person in the household identified as someone who pays the rent or the mortgage, or the taxes, or the electricity bill, and so on, for the dwelling.
In the case of a household where two or more people are listed as household maintainers, the first person listed is chosen as the primary household maintainer.
2) 'Aboriginal identity' includes persons who are First Nations (North American Indian), Métis or Inuk (Inuit) and/or those who are Registered or Treaty Indians (that is, registered under the Indian Act of Canada) and/or those who have membership in a First Nation or Indian band. Aboriginal peoples of Canada are defined in the Constitution Act, 1982, section 35 (2) as including the Indian, Inuit and Métis peoples of Canada.
3) 'Non-immigrants' includes persons who are Canadian citizens by birth.
4) 'Non-permanent residents' includes persons from another country who have a work or study permit or who are refugee claimants, and their family members sharing the same permit and living in Canada with them.
5) 'Immigrants' includes persons who are, or who have ever been, landed immigrants or permanent residents. Such persons have been granted the right to live in Canada permanently by immigration authorities. Immigrants who have obtained Canadian citizenship by naturalization are included in this category. In the 2016 Census of Population, 'Immigrants' includes immigrants who landed in Canada on or prior to May 10, 2016.
6) Immigrants may not have a complete year of applicable income. The income data for the 2016 Census of Population are for the year 2015.
7) Includes immigrants who landed in Canada on or prior to May 10, 2016.
Original file name: CRO0163850_CT.5 (BC_Cultural),ivt
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Feature preparation Preprocessing was applied to the data, such as creating dummy variables and performing transformations (centering, scaling, YeoJohnson) using the preProcess() function from the “caret” package in R. The correlation among the variables was examined and no serious multicollinearity problems were found. A stepwise variable selection was performed using a logistic regression model. The final set of variables included: Demographic: age, body mass index, sex, ethnicity, smoking History of disease: heart disease, migraine, insomnia, gastrointestinal disease, COVID-19 history: covid vaccination, rashes, conjunctivitis, shortness of breath, chest pain, cough, runny nose, dysgeusia, muscle and joint pain, fatigue, fever ,COVID-19 reinfection, and ICU admission. These variables were used to train and test various machine-learning models Model selection and training The data was randomly split into 80% training and 20% testing subsets. The “h2o” package in R version 4.3.1 was employed to implement different algorithms. AutoML was first used, which automatically explored a range of models with different configurations. Gradient Boosting Machines (GBM), Random Forest (RF), and Regularized Generalized Linear Model (GLM) were identified as the best-performing models on our data and their parameters were fine-tuned. An ensemble method that stacked different models together was also used, as it could sometimes improve the accuracy. The models were evaluated using the area under the curve (AUC) and C-statistics as diagnostic measures. The model with the highest AUC was selected for further analysis using the confusion matrix, accuracy, sensitivity, specificity, and F1 and F2 scores. The optimal prediction threshold was determined by plotting the sensitivity, specificity, and accuracy and choosing the point of intersection as it balanced the trade-off between the three metrics. The model’s predictions were also plotted, and the quantile ranges were used to classify the model’s prediction as follows: > 1st quantile, > 2nd quantile, > 3rd quartile and < 3rd quartile (very low, low, moderate, high) respectively. Metric Formula C-statistics (TPR + TNR - 1) / 2 Sensitivity/Recall TP / (TP + FN) Specificity TN / (TN + FP) Accuracy (TP + TN) / (TP + TN + FP + FN) F1 score 2 * (precision * recall) / (precision + recall) Model interpretation We used the variable importance plot, which is a measure of how much each variable contributes to the prediction power of a machine learning model. In H2O package, variable importance for GBM and RF is calculated by measuring the decrease in the model's error when a variable is split on. The more a variable's split decreases the error, the more important that variable is considered to be. The error is calculated using the following formula: 𝑆𝐸=𝑀𝑆𝐸∗𝑁=𝑉𝐴𝑅∗𝑁 and then it is scaled between 0 and 1 and plotted. Also, we used The SHAP summary plot which is a graphical tool to visualize the impact of input features on the prediction of a machine learning model. SHAP stands for SHapley Additive exPlanations, a method to calculate the contribution of each feature to the prediction by averaging over all possible subsets of features [28]. SHAP summary plot shows the distribution of the SHAP values for each feature across the data instances. We use the h2o.shap_summary_plot() function in R to generate the SHAP summary plot for our GBM model. We pass the model object and the test data as arguments, and optionally specify the columns (features) we want to include in the plot. The plot shows the SHAP values for each feature on the x-axis, and the features on the y-axis. The color indicates whether the feature value is low (blue) or high (red). The plot also shows the distribution of the feature values as a density plot on the right.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
P for trend was calculated across quartiles of formula/cows' milk intake.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Baseline characteristics of the study population according to CMI quartiles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quartile values of the number of tandem repeats at each locus in each major clade using all strains isolated in Chiba prefecture a.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
An academic journal or research journal is a periodical publication in which research articles relating to a particular academic discipline is published, according to Wikipedia. Currently, there are more than 25,000 peer-reviewed journals that are indexed in citation index databases such as Scopus and Web of Science. These indexes are ranked on the basis of various metrics such as CiteScore, H-index, etc. The metrics are calculated from yearly citation data of the journal. A lot of efforts are given to make a metric that reflects the journal's quality.
This is a comprehensive dataset on the academic journals coving their metadata information as well as citation, metrics, and ranking information. Detailed data on their subject area is also given in this dataset. The dataset is collected from the following indexing databases: - Scimago Journal Ranking - Scopus - Web of Science Master Journal List
The data is collected by scraping and then it was cleaned, details of which can be found in HERE.
Rest of the features provide further details on the journal's subject area or category: - Life Sciences: Top level subject area. - Social Sciences: Top level subject area. - Physical Sciences: Top level subject area. - Health Sciences: Top level subject area. - 1000 General: ASJC main category. - 1100 Agricultural and Biological Sciences: ASJC main category. - 1200 Arts and Humanities: ASJC main category. - 1300 Biochemistry, Genetics and Molecular Biology: ASJC main category. - 1400 Business, Management and Accounting: ASJC main category. - 1500 Chemical Engineering: ASJC main category. - 1600 Chemistry: ASJC main category. - 1700 Computer Science: ASJC main category. - 1800 Decision Sciences: ASJC main category. - 1900 Earth and Planetary Sciences: ASJC main category. - 2000 Economics, Econometrics and Finance: ASJC main category. - 2100 Energy: ASJC main category. - 2200 Engineering: ASJC main category. - 2300 Environmental Science: ASJC main category. - 2400 Immunology and Microbiology: ASJC main category. - 2500 Materials Science: ASJC main category. - 2600 Mathematics: ASJC main category. - 2700 Medicine: ASJC main category. - 2800 Neuroscience: ASJC main category. - 2900 Nursing: ASJC main category. - 3000 Pharmacology, Toxicology and Pharmaceutics: ASJC main category. - 3100 Physics and Astronomy: ASJC main category. - 3200 Psychology: ASJC main category. - 3300 Social Sciences: ASJC main category. - 3400 Veterinary: ASJC main category. - 3500 Dentistry: ASJC main category. - 3600 Health Professions: ASJC main category.