Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.
Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.
Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.
This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.
This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.
This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.
The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.
These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.
For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2024 for 462 electoral wards within 11 Local Government Districts. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterThis data set contains surface elevation data over Greenland measured by the NASA Land, Vegetation, and Ice Sensor (LVIS), an airborne lidar scanning laser altimeter.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics (minimum, first quartile, median, mean, third quartile, maximum) of probabilities for the RPDLomax and Logistic models by class (Wilt dataset).
Facebook
TwitterThe DataPusher+ extension for CKAN is a comprehensive data loading and analysis solution that combines the speed and reliability of ckanext-xloader with advanced data type inference and metadata generation capabilities. This major evolution of the original Datapusher transforms from a standalone webservice into a full-fledged CKAN extension, leveraging the ultra-fast qsv data-wrangling engine to provide "bullet-proof" data type inferencing and extensive preprocessing capabilities. The extension introduces intelligent formula-based metadata generation using Jinja2 templates, enabling automatic calculation and suggestion of complex metadata properties including DCAT 3 compliance fields. Key Features Ultra-Fast Data Type Inference with qsv: Utilizes the Rust-based qsv engine to scan entire datasets (not just sample rows) for guaranteed accurate data type detection, completing analysis of 100MB+ files in under 5 seconds while calculating comprehensive summary statistics including cardinality, sparsity, quartiles, and statistical measures. Formula-Based Metadata Generation: Supports Jinja2 formula expressions in scheming configuration files that can access extensive dataset statistics (dpps), frequency tables (dppf), and calculated metadata (dpp) to automatically generate or suggest complex metadata properties, with support for both immediate assignment (formula) and suggested values (suggest_formula). Advanced Data Preprocessing: Automatically handles Excel/ODS conversion, SHP/GeoJSON transformation, ZIP archive processing, date format normalization (supporting 19+ formats), CSV dialect standardization, duplicate detection and removal, PII screening with quarantine capabilities, and geometry simplification for spatial data. Production-Ready Robustness: Addresses common Datapusher failure scenarios with comprehensive error handling, actionable error messages, and the ability to recover from data quality issues without losing entire processing jobs, making it suitable for enterprise production environments.
Facebook
TwitterBy Health Data New York [source]
This dataset contains New York State county-level data on obesity and diabetes related indicators from 2008 - 2012. It includes information about counties' population health status, such as the number of events, percentage/rate, 95% confidence interval, measured units and more. Analyzing this data provides insight into how communities across New York State are impacted by these diseases and how we can work together to create healthier living environments for everyone. This dataset is released under a Terms of Service license agreement – make sure to read through and understand the details if you plan to use it in any research or commercial application
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains county-level data on obesity and diabetes related indicators in New York State. As such, it can be used to research indicators related to general health in various counties of the state.
To use this dataset effectively, first become familiar with the columns included and their meanings: - County Name: The name of the county. (String) - County Code: The code of the county. (Integer) - Region Name: The name of the region. (String) - Indicator Number: The number of the indicator. (Integer) - Total Event Counts: The total number of events related to the indicator.(Integer)
- Denominator: The denominator used to calculate the percentage/rate.(Integer) - Denominator Note: Any additional notes related to the denominator.(String) - Measure Unit :The unit of measure used for this rate/percentage .(String). - Percentage/Rate :The percentage/rate calculated using denominator and observed count data .(Float). - 95% CI :The 95% confidence interval associated with any defined rate or percentage.(Float). - Data Comments :Any additional comments relevant to this data source or indicator .(String ). - Data Years :Years covered by this particular indicator observation .(String ). - Data Sources :Sources from which we have drawn our data for indicators involving counties from different regions .(Strings). - Quartile :Quartiles are derived when all geographic entities are ranked according to a specific metric score ,and are then cut into quartiles based on speed score =0= bottom quarter; =1= middle two quarters combined; =2= top quarter..(Integer). - Mapping Distribution ;A visual representation that includes mapping details regarding how Indicators relating either disease rates or characteristics are positioned across States, regions and counties as well as any trends plus other pertinent mapping information ,such as health resource availability.(In pair plot form form otherwise text will present an informational string.). Location ;Area where distribution around space occurs..e point feature with a single location ID retrieved from geoplanet proxy service.. (string ).Using these columns, you can find out demographic information about your chosen county such as obesity rate and diabetes incidence etc., enabling you better understand its health situation overall. Additionally,this dataset also provides important comparison features such as quartiles rankings
Analysing the geographic distribution of obesity and diabetes related indicators by county in New York State, in order to identify areas which may require greater levels of intervention and preventative health measures.
Evaluating trends over time for different counties to assess whether policies or programs have had an impact on indicators relating to obesity and diabetes within the given area.
Using machine learning techniques such as clustering analysis or predictive modelling, to identify patterns within the data which can be used to better inform preventative health interventions across New York State
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: community-health-obesity-and-diabetes-related-indicators-2008-2012-1.csv | Column name | Description | |:-------------------------|:-----------------------------------------------------------------------------------------| | **Count...
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2024 for 11 Local Government Districts in Northern Ireland. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterNote: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses. Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables. Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021. This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data. This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score. This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4. The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting. These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons. For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 11/15/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.
#Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Index of Household Advantage and Disadvantage (IHAD) provides a summary measure of relative socio-economic advantage and disadvantage for households, based on the characteristics of dwellings and the people living within them, using 2021 Census data.
All in-scope households are ordered from lowest to highest score. A low score indicates relatively greater disadvantage and a lack of advantage in general. A high score indicates a relative lack of disadvantage and greater advantage in general.
This dataset presents IHAD data in quartiles. The lowest 25% of households are given a quartile number of 1, the next lowest 25% of households are given a quartile number of 2 and so on, up to the highest 25% of households which are given a quartile number of 4. This means that households are divided into four equal sized groups, depending on their score. In practice these groups won’t each be exactly 25% of households as it depends on the distribution of the IHAD scores. The data is grouped by Statistical Area Level 2 (SA2 2021). SA2s are defined by the Australian Statistical Geography Standard (ASGS) Edition 3.
Key Attributes:
Field alias
Field name
Description
Statistical Areas Level 2 2021 code
SA2_CODE_2021
2021 Statistical Areas Level 2 (SA2) codes from the Australian Statistical Geography Standard (ASGS), Edition 3. SA2s are medium-sized general purpose areas built to represent communities that interact together socially and economically.
Statistical Areas Level 2 2021 name
SA2_NAME_2021
2021 Statistical Areas Level 2 (SA2) names from the Australian Statistical Geography Standard (ASGS), Edition 3. SA2s are medium-sized general purpose areas built to represent communities that interact together socially and economically.
Area in square kilometres
AREA_ALBERS_SQKM
The area of a region in square kilometres, based on the Albers equal area conic projection.
Uniform Resource Identifier
ASGS_LOCI_URI_2021
A uniform resource identifier can be used in web linked applications for data integration.
IHAD quartile 1
IHAD_QUARTILE1
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 1, indicating relatively greater disadvantage and a lack of advantage in general.
IHAD quartile 2
IHAD_QUARTILE2
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 2.
IHAD quartile 3
IHAD_QUARTILE3
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 3.
IHAD quartile 4
IHAD_QUARTILE4
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 4, indicating a relative lack of disadvantage and greater advantage in general.
Occupied private dwellings
OPD_2021
Dwellings in-scope of the IHAD i.e. classifiable occupied private dwellings.
SEIFA IRSAD quartile
IRSAD_QUARTILE
Index of Relative Socio-economic Advantage and Disadvantage quartile. All SA2s are ordered from lowest to highest score, the lowest 25% of SA2s are given a quartile number of 1, the next lowest 25% of SA2s are given a quartile number of 2 and so on, up to the highest 25% of SA2s which are given a quartile number of 4. This means that SA2s are divided into four equal sized groups, depending on their score. In practice these groups won’t each be exactly 25% of SA2s as it depends on the distribution of SEIFA scores.
Usual resident population
URP_2021
Population counts in this column are based on place of usual residence as reported on Census Night. These include persons out of scope of the IHAD.
Dwellings
DWELLING
Total dwellings at Census time, including dwellings out of scope of the IHAD e.g. unoccupied private dwellings.
Please note: Proportional totals may equal more than 100% due to rounding and random adjustments made to the data. When calculating proportions, percentages, or ratios from cross-classified or small area tables, the random error introduced can be ignored except when very small cells are involved, in which case the impact on percentages and ratios can be significant. Refer to the Introduced random error / perturbation Census page on the ABS website for more information.
Data and geography references
Source data publication: Index of Household Advantage and Disadvantage Geographic boundary information: Australian Statistical Geography Standard (ASGS) Edition 3 Further information: Index of Household Advantage and Disadvantage methodology, 2021 Source: Australian Bureau of Statistics (ABS)
Contact the Australian Bureau of Statistics
Email geography@abs.gov.au if you have any questions or feedback about this web service.
Subscribe to get updates on ABS web services and geospatial products.
Privacy at the Australian Bureau of Statistics Read how the ABS manages personal information - ABS privacy policy.
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains financial information for the top 500 companies in India, including their market capitalization and quarterly sales. The data is categorized based on market cap and sales quartiles, allowing for detailed analysis and comparison. This dataset can be used to identify trends, patterns, and key metrics that are crucial for understanding the competitive landscape in the Indian market.
Facebook
TwitterAnnual descriptive price statistics for each calendar year 2005 – 2023 for 11 Local Government Districts in Northern Ireland. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Administrative disposable income is a third pillar of the income statistics that Statbel publishes, alongside "\2" and poverty indicators based on "\2", and allows answering other types of questions than SILC and tax statistics.
SILC uses "\2" at the household level as a concept of income, cumulating the incomes of all household members. In the next step, this disposable income is converted into equivalised disposable income to take into account the composition of the household. Based on the SILC, at-risk-of-poverty figures are published up to the provincial level. However, the sample size does not allow for analyses at a more detailed geographical level. However, statistics based on tax revenues are available up to the level of the statistical sector, but are limited to taxable income in the context of personal income tax returns. Non-taxable income is not taken into account and there is also no correction according to the composition of the household.
The variable "administrative equivalised disposable income" responds to a growing demand for income and poverty figures at the communal level. It uses an income concept based on administrative sources that tries to correspond as much as possible to that of SILC. For the population as a whole, both taxable and non-taxable income are taken into account. They are added together for all members of the household in order to obtain an administrative disposable income for the household. After adjusting for the composition of the household, the variable "administrative equivalised disposable income" is established. This can be used to calculate income and poverty figures at the communal level.
Indicators are not disseminated for an entity and a category when there are at least 15% of people whose equivalent administrative disposable income is missing or when there are less than 100 people with a valid income.
More information on the page "\2" of Statbel
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
To analyze the salaries of company employees using Pandas, NumPy, and other tools, you can structure the analysis process into several steps:
Case Study: Employee Salary Analysis In this case study, we aim to analyze the salaries of employees across different departments and levels within a company. Our goal is to uncover key patterns, identify outliers, and provide insights that can support decisions related to compensation and workforce management.
Step 1: Data Collection and Preparation Data Sources: The dataset typically includes employee ID, name, department, position, years of experience, salary, and additional compensation (bonuses, stock options, etc.). Data Cleaning: We use Pandas to handle missing or incomplete data, remove duplicates, and standardize formats. Example: df.dropna() to handle missing salary information, and df.drop_duplicates() to eliminate duplicate entries. Step 2: Data Exploration and Descriptive Statistics Exploratory Data Analysis (EDA): Using Pandas to calculate basic statistics such as mean, median, mode, and standard deviation for employee salaries. Example: df['salary'].describe() provides an overview of the distribution of salaries. Data Visualization: Leveraging tools like Matplotlib or Seaborn for visualizing salary distributions, box plots to detect outliers, and bar charts for department-wise salary breakdowns. Example: sns.boxplot(x='department', y='salary', data=df) provides a visual representation of salary variations by department. Step 3: Analysis Using NumPy Calculating Salary Ranges: NumPy can be used to calculate the range, variance, and percentiles of salary data to identify the spread and skewness of the salary distribution. Example: np.percentile(df['salary'], [25, 50, 75]) helps identify salary quartiles. Correlation Analysis: Identify the relationship between variables such as experience and salary using NumPy to compute correlation coefficients. Example: np.corrcoef(df['years_of_experience'], df['salary']) reveals if experience is a significant factor in salary determination. Step 4: Grouping and Aggregation Salary by Department and Position: Using Pandas' groupby function, we can summarize salary information for different departments and job titles to identify trends or inequalities. Example: df.groupby('department')['salary'].mean() calculates the average salary per department. Step 5: Salary Forecasting (Optional) Predictive Analysis: Using tools such as Scikit-learn, we could build a regression model to predict future salary increases based on factors like experience, education level, and performance ratings. Step 6: Insights and Recommendations Outlier Identification: Detect any employees earning significantly more or less than the average, which could signal inequities or high performers. Salary Discrepancies: Highlight any salary discrepancies between departments or gender that may require further investigation. Compensation Planning: Based on the analysis, suggest potential changes to the salary structure or bonus allocations to ensure fair compensation across the organization. Tools Used: Pandas: For data manipulation, grouping, and descriptive analysis. NumPy: For numerical operations such as percentiles and correlations. Matplotlib/Seaborn: For data visualization to highlight key patterns and trends. Scikit-learn (Optional): For building predictive models if salary forecasting is included in the analysis. This approach ensures a comprehensive analysis of employee salaries, providing actionable insights for human resource planning and compensation strategy.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Facebook
TwitterThe Human Sciences Research Council (HSRC) carried out the Migration and Remittances Survey in South Africa for the World Bank in collaboration with the African Development Bank. The primary mandate of the HSRC in this project was to come up with a migration database that includes both immigrants and emigrants. The specific activities included: · A household survey with a view of producing a detailed demographic/economic database of immigrants, emigrants and non migrants · The collation and preparation of a data set based on the survey · The production of basic primary statistics for the analysis of migration and remittance behaviour in South Africa.
Like many other African countries, South Africa lacks reliable census or other data on migrants (immigrants and emigrants), and on flows of resources that accompanies movement of people. This is so because a large proportion of African immigrants are in the country undocumented. A special effort was therefore made to design a household survey that would cover sufficient numbers and proportions of immigrants, and still conform to the principles of probability sampling. The approach that was followed gives a representative picture of migration in 2 provinces, Limpopo and Gauteng, which should be reflective of migration behaviour and its impacts in South Africa.
Two provinces: Gauteng and Limpopo
Limpopo is the main corridor for migration from African countries to the north of South Africa while Gauteng is the main port of entry as it has the largest airport in Africa. Gauteng is a destination for internal and international migrants because it has three large metropolitan cities with a great economic potential and reputation for offering employment, accommodations and access to many different opportunities within a distance of 56 km. These two provinces therefore were expected to accommodate most African migrants in South Africa, co-existing with a large host population.
The target group consists of households in all communities. The survey will be conducted among metro and non-metro households. Non-metro households include those in: - small towns, - secondary cities, - peri-urban settlements and - deep rural areas. From each selected household, one adult respondent will be selected to participate in the study.
Sample survey data [ssd]
Migration data for South Africa are available for 2007 only at the level of local governments or municipalities from the 2007 Census; for smaller areas called "sub places" (SPs) only as recently as the 2001 census, and for the desired EAs only back so far as the Census of 1996. In sum, there was no single source that provided recent data on the five types of migrants of principal interest at the level of the Enumeration Area, which was the area for which data were needed to draw the sample since it was going to be necessary to identify migrant and non-migrant households in the sample areas in order to oversample those with migrants for interview.
In an attempt to overcome the data limitations referred to above, it was necessary to adopt a novel approach to the design of the sample for the World Bank's household migration survey in South Africa, to identify EAs with a high probability of finding immigrants and those with a low probability. This required the combined use of the three sources of data described above. The starting point was the CS 2007 survey, which provided data on migration at a local government level, classifying each local government cluster in terms of migration level, taking into account the types of migrants identified. The researchers then spatially zoomed in from these clusters to the so-called sub-places (SPs) from the 2001 Census to classifying SP clusters by migration level. Finally, the 1996 Census data were used to zoom in even further down to the EA level, using the 1996 census data on migration levels of various typed, to identify the final level of clusters for the survey, namely the spatially small EAs (each typically containing about 200 households, and hence amenable to the listing operation in the field).
A higher score or weight was attached to the 2007 Community Survey municipality-level (MN) data than to the Census 2001 sub-place (SP) data, which in turn was given a greater weight than the 1996 enumerator area (EA) data. The latter was derived exclusively from the Census 1996 EA data, but has then been reallocated to the 2001 EAs proportional to geographical size. Although these weights are purely arbitrary since it was composed from different sources, they give an indication of the relevant importance attached to the different migrant categories. These weighted migrant proportions (secondary strata), therefore constituted the second level of clusters for sampling purposes.
In addition, a system of weighting or scoring the different persons by migrant type was applied to ensure that the likelihood of finding migrants would be optimised. As part of this procedure, recent migrants (who had migrated in the preceding five years) received a higher score than lifetime migrants (who had not migrated during the preceding five years). Similarly, a higher score was attached to international immigrants (both recent and lifetime, who had come to SA from abroad) than to internal migrants (who had only moved within SA's borders). A greater weight also applied to inter-provincial (internal) than to intra-provincial migrants (who only moved within the same South African province).
How the three data sources were combined to provide overall scores for EA can be briefly described. First, in each of the two provinces, all local government units were given migration scores according to the numbers or relative proportions of the population classified in the various categories of migrants (with non-migrants given a score of 1.0. Migrants were assigned higher scores according to their priority, with international migrants given higher scores than internal migrants and recent migrants higher scores than lifetime migrants. Then within the local governments, sub-places were assigned scores assigned on the basis of inter vs. intra-provincial migrants using the 2001 census data. Each SP area in a local government was thus assigned a value which was the product of its local government score (the same for all SPs in the local government) and its own SP score. The third and final stage was to develop relative migration scores for all the EAs from the 1996 census by similarly weighting the proportions of migrants (and non-migrants, assigned always 1.0) of each type. The the final migration score for an EA is the product of its own EA score from 1996, the SP score of which it is a part (assigned to all the EAs within the SP), and the local government score from the 2007 survey.
Based on all the above principles the set of weights or scores was developed.
In sum, we multiplied the proportion of populations of each migrant type, or their incidence, by the appropriate final corresponding EA scores for persons of each type in the EA (based on multiplying the three weights together), to obtain the overall score for each EA. This takes into account the distribution of persons in the EA according to migration status in 1996, the SP score of the EA in 2001, and the local government score (in which the EA is located) from 2007. Finally, all EAs in each province were then classified into quartiles, prior to sampling from the quartiles.
From the EAs so classified, the sampling took the form of selecting EAs, i.e., primary sampling units (PSUs, which in this case are also Ultimate Sampling Units, since this is a single stage sample), according to their classification into quartiles. The proportions selected from each quartile are based on the range of EA-level scores which are assumed to reflect weighted probabilities of finding desired migrants in each EA. To enhance the likelihood of finding migrants, much higher proportions of EAs were selected into the sample from the quartiles with the higher scores compared to the lower scores (disproportionate sampling). The decision on the most appropriate categorisations was informed by the observed migration levels in the two provinces of the study area during 2007, 2001 and 1996, analysed at the lowest spatial level for which migration data was available in each case.
Because of the differences in their characteristics it was decided that the provinces of Gauteng and Limpopo should each be regarded as an explicit stratum for sampling purposes. These two provinces therefore represented the primary explicit strata. It was decided to select an equal number of EAs from these two primary strata.
The migration-level categories referred to above were treated as secondary explicit strata to ensure optimal coverage of each in the sample. The distribution of migration levels was then used to draw EAs in such a way that greater preference could be given to areas with higher proportions of migrants in general, but especially immigrants (note the relative scores assigned to each type of person above). The proportion of EAs selected into the sample from the quartiles draws upon the relative mean weighted migrant scores (referred to as proportions) found below the table, but this is a coincidence and not necessary, as any disproportionate sampling of EAs from the quartiles could be done, since it would be rectified in the weighting at the end for the analysis.
The resultant proportions of migrants then led to the following proportional allocation of sampled EAs (Quartile 1: 5 per cent (instead of 25% as in an equal distribution), Quartile 2: 15 per cent (instead
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual workplace-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AL refers to the axial length, CCT to the central corneal thickness, ACD to the external phakic anterior chamber depth measured from the corneal front apex to the front apex of the crystalline lens, LT to the central thickness of the crystalline lens, R1 and R2 to the corneal radii of curvature for the flat and steep meridians, Rmean to the average of R1 and R2, PIOL to the refractive power of the intraocular lens implant, and SEQ to the spherical equivalent power achieved 5 to 12 weeks after cataract surgery.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file includes Report Card administrator experience status by school poverty quartile data for the 2017-18 through 2023-24 school years. Data is disaggregated by state, ESD, LEA, and school level. Please review the notes below for more information.