Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2024 for 462 electoral wards within 11 Local Government Districts. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2024 for 11 Local Government Districts in Northern Ireland. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective was to identify horn fly-susceptible and horn fly-resistant animals in a Sindhi herd by two different methods. The number of horn flies on 25 adult cows from a Sindhi herd was counted every 14 days. As it was an open herd, the trial period was divided into three stages based on cow composition, with the same cows maintained within each period: 2011-2012 (36 biweekly observations); 2012-2013 (26 biweekly observations); and 2013-2014 (22 biweekly observations). Only ten cows were present in the herd throughout the entire period from 2011-2014 (84 biweekly observations). The variables evaluated were the number of horn flies on the cows, the sampling date and a binary variable for rainy or dry season. Descriptive statistics were calculated, including the median, the interquartile range, and the minimum and maximum number of horn flies, for each observation day. For the present analysis, fly-susceptible cows were identified as those for which the infestation of flies appeared in the upper quartile for more than 50% of the weeks and in the lower quartile for less than 20% of the weeks. In contrast, fly-resistant cows were defined as those for which the fly counts appeared in the lower quartile for more than 50% of the weeks and in the upper quartile for less than 20% of the weeks. To identify resistant and susceptible cows for the best linear unbiased predictions analysis, three repeated measures linear mixed models (one for each period) were constructed with cow as a random effect intercept. The response variable was the log ten transformed counts of horn flies per cow, and the explanatory variable were the observation date and season. As the trail took place in a semiarid region with two seasons well stablished the season was evaluated monthly as a binary outcome, considering a rainy season if it rained more or equal than 50mm or dry season if the rain was less than 50mm. The Standardized residuals and the BLUPs of the random effects were obtained and assessed for normality, heteroscedasticity and outlying observations. Each cow’s BLUPs were plotted against the average quantile rank values that were determined as the difference between the number of weeks in the high-risk quartile group and the number of weeks in the low risk quartile group, averaged by the total number of weeks in each of the observation periods. A linear model fit for the values of BLUPS against the average rank values and the correlation between the two methods was tested using Spearman’s correlation coefficient. The animal effect values (BLUPs) were evaluated by percentiles, with 0 representing the lowest counts (or more resistant cows) and 10 representing the highest counts (or more susceptible cows). These BLUPs represented only the effect of cow and not the effect of day, season or other unmeasured counfounders.
Facebook
TwitterPhylogenetic metrics are essential tools used in the study of ecology, evolution and conservation. Phylogenetic diversity (PD) in particular is one of the most prominent measures of biodiversity, and is based on the idea that biological features accumulate along the edges of phylogenetic trees that are summed. We argue that PD and many other phylogenetic biodiversity metrics fail to capture an essential process that we term attrition. Attrition is the gradual loss of features and other sources of variety through causes other than extinction. Here we introduce `EvoHeritage', a generalisation of PD that is founded on the joint processes of accumulation and attrition of features. We argue that whilst PD measures evolutionary history, EvoHeritage is required to capture a more pertinent subset of evolutionary history including only components that have survived attrition. We show that EvoHeritage is not the same as PD on a tree with scaled edges; instead, accumulation and attrition interact ..., Data was reprocessed from published sources as described in the associated manuscript methods section There is no primary data included. Supplementary material and glossary contain mathematical details and proofs rather than any primary data sets., R for use of code, otherwise PDF and CSV readers are needed to access the files., # Title of Dataset
This dataset contains
Data was derived from the following sources (as described in the associated manuscript methods)
Facebook
TwitterCommunity-based HIV testing offers an alternative approach to encourage HIV testing among men in sub-Saharan Africa. In this study, we evaluated a community-based HIV testing strategy targeting male bar patrons in northern Tanzania to assess factors predictive of prior HIV testing and factors predictive of accepting a real-time HIV test offer. Participants completed a detailed survey and were offered HIV testing upon survey completion. Poisson regression was used to identify prevalence ratios for the association between potential predictors and prior HIV testing or real-time testing uptake. Of 359 participants analyzed, the median age was 41 (range 19–82) years, 257 (71.6%) reported a previous HIV test, and 321 (89.4%) accepted the real-time testing offer. Factors associated with previous testing for HIV (adjusted prevalence ratio [aPR], 95% CI) were wealth scores in the upper-middle quartile (1.25, 1.03–1.52) or upper quartile (1.35, 1.12–1.62) and HIV knowledge (1.04, 1.01–1.07). Factors that predicted real-time testing uptake were lower scores on the Gender-Equitable Men scale (0.99, 0.98–0.99), never testing for HIV (1.16, 1.03–1.31), and testing for HIV > 12 months prior (1.18, 1.06–1.31). We show that individual-level factors that influence the testing-seeking behaviors of men are not likely to impact their acceptance of an HIV offer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Index of Household Advantage and Disadvantage (IHAD) provides a summary measure of relative socio-economic advantage and disadvantage for households, based on the characteristics of dwellings and the people living within them, using 2021 Census data.
All in-scope households are ordered from lowest to highest score. A low score indicates relatively greater disadvantage and a lack of advantage in general. A high score indicates a relative lack of disadvantage and greater advantage in general.
This dataset presents IHAD data in quartiles. The lowest 25% of households are given a quartile number of 1, the next lowest 25% of households are given a quartile number of 2 and so on, up to the highest 25% of households which are given a quartile number of 4. This means that households are divided into four equal sized groups, depending on their score. In practice these groups won’t each be exactly 25% of households as it depends on the distribution of the IHAD scores. The data is grouped by Statistical Area Level 2 (SA2 2021). SA2s are defined by the Australian Statistical Geography Standard (ASGS) Edition 3.
Key Attributes:
Field alias
Field name
Description
Statistical Areas Level 2 2021 code
SA2_CODE_2021
2021 Statistical Areas Level 2 (SA2) codes from the Australian Statistical Geography Standard (ASGS), Edition 3. SA2s are medium-sized general purpose areas built to represent communities that interact together socially and economically.
Statistical Areas Level 2 2021 name
SA2_NAME_2021
2021 Statistical Areas Level 2 (SA2) names from the Australian Statistical Geography Standard (ASGS), Edition 3. SA2s are medium-sized general purpose areas built to represent communities that interact together socially and economically.
Area in square kilometres
AREA_ALBERS_SQKM
The area of a region in square kilometres, based on the Albers equal area conic projection.
Uniform Resource Identifier
ASGS_LOCI_URI_2021
A uniform resource identifier can be used in web linked applications for data integration.
IHAD quartile 1
IHAD_QUARTILE1
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 1, indicating relatively greater disadvantage and a lack of advantage in general.
IHAD quartile 2
IHAD_QUARTILE2
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 2.
IHAD quartile 3
IHAD_QUARTILE3
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 3.
IHAD quartile 4
IHAD_QUARTILE4
Proportion of in-scope dwellings in the SA2 that fall into IHAD quartile 4, indicating a relative lack of disadvantage and greater advantage in general.
Occupied private dwellings
OPD_2021
Dwellings in-scope of the IHAD i.e. classifiable occupied private dwellings.
SEIFA IRSAD quartile
IRSAD_QUARTILE
Index of Relative Socio-economic Advantage and Disadvantage quartile. All SA2s are ordered from lowest to highest score, the lowest 25% of SA2s are given a quartile number of 1, the next lowest 25% of SA2s are given a quartile number of 2 and so on, up to the highest 25% of SA2s which are given a quartile number of 4. This means that SA2s are divided into four equal sized groups, depending on their score. In practice these groups won’t each be exactly 25% of SA2s as it depends on the distribution of SEIFA scores.
Usual resident population
URP_2021
Population counts in this column are based on place of usual residence as reported on Census Night. These include persons out of scope of the IHAD.
Dwellings
DWELLING
Total dwellings at Census time, including dwellings out of scope of the IHAD e.g. unoccupied private dwellings.
Please note: Proportional totals may equal more than 100% due to rounding and random adjustments made to the data. When calculating proportions, percentages, or ratios from cross-classified or small area tables, the random error introduced can be ignored except when very small cells are involved, in which case the impact on percentages and ratios can be significant. Refer to the Introduced random error / perturbation Census page on the ABS website for more information.
Data and geography references
Source data publication: Index of Household Advantage and Disadvantage Geographic boundary information: Australian Statistical Geography Standard (ASGS) Edition 3 Further information: Index of Household Advantage and Disadvantage methodology, 2021 Source: Australian Bureau of Statistics (ABS)
Contact the Australian Bureau of Statistics
Email geography@abs.gov.au if you have any questions or feedback about this web service.
Subscribe to get updates on ABS web services and geospatial products.
Privacy at the Australian Bureau of Statistics Read how the ABS manages personal information - ABS privacy policy.
Facebook
TwitterThe Human Sciences Research Council (HSRC) carried out the Migration and Remittances Survey in South Africa for the World Bank in collaboration with the African Development Bank. The primary mandate of the HSRC in this project was to come up with a migration database that includes both immigrants and emigrants. The specific activities included: · A household survey with a view of producing a detailed demographic/economic database of immigrants, emigrants and non migrants · The collation and preparation of a data set based on the survey · The production of basic primary statistics for the analysis of migration and remittance behaviour in South Africa.
Like many other African countries, South Africa lacks reliable census or other data on migrants (immigrants and emigrants), and on flows of resources that accompanies movement of people. This is so because a large proportion of African immigrants are in the country undocumented. A special effort was therefore made to design a household survey that would cover sufficient numbers and proportions of immigrants, and still conform to the principles of probability sampling. The approach that was followed gives a representative picture of migration in 2 provinces, Limpopo and Gauteng, which should be reflective of migration behaviour and its impacts in South Africa.
Two provinces: Gauteng and Limpopo
Limpopo is the main corridor for migration from African countries to the north of South Africa while Gauteng is the main port of entry as it has the largest airport in Africa. Gauteng is a destination for internal and international migrants because it has three large metropolitan cities with a great economic potential and reputation for offering employment, accommodations and access to many different opportunities within a distance of 56 km. These two provinces therefore were expected to accommodate most African migrants in South Africa, co-existing with a large host population.
The target group consists of households in all communities. The survey will be conducted among metro and non-metro households. Non-metro households include those in: - small towns, - secondary cities, - peri-urban settlements and - deep rural areas. From each selected household, one adult respondent will be selected to participate in the study.
Sample survey data [ssd]
Migration data for South Africa are available for 2007 only at the level of local governments or municipalities from the 2007 Census; for smaller areas called "sub places" (SPs) only as recently as the 2001 census, and for the desired EAs only back so far as the Census of 1996. In sum, there was no single source that provided recent data on the five types of migrants of principal interest at the level of the Enumeration Area, which was the area for which data were needed to draw the sample since it was going to be necessary to identify migrant and non-migrant households in the sample areas in order to oversample those with migrants for interview.
In an attempt to overcome the data limitations referred to above, it was necessary to adopt a novel approach to the design of the sample for the World Bank's household migration survey in South Africa, to identify EAs with a high probability of finding immigrants and those with a low probability. This required the combined use of the three sources of data described above. The starting point was the CS 2007 survey, which provided data on migration at a local government level, classifying each local government cluster in terms of migration level, taking into account the types of migrants identified. The researchers then spatially zoomed in from these clusters to the so-called sub-places (SPs) from the 2001 Census to classifying SP clusters by migration level. Finally, the 1996 Census data were used to zoom in even further down to the EA level, using the 1996 census data on migration levels of various typed, to identify the final level of clusters for the survey, namely the spatially small EAs (each typically containing about 200 households, and hence amenable to the listing operation in the field).
A higher score or weight was attached to the 2007 Community Survey municipality-level (MN) data than to the Census 2001 sub-place (SP) data, which in turn was given a greater weight than the 1996 enumerator area (EA) data. The latter was derived exclusively from the Census 1996 EA data, but has then been reallocated to the 2001 EAs proportional to geographical size. Although these weights are purely arbitrary since it was composed from different sources, they give an indication of the relevant importance attached to the different migrant categories. These weighted migrant proportions (secondary strata), therefore constituted the second level of clusters for sampling purposes.
In addition, a system of weighting or scoring the different persons by migrant type was applied to ensure that the likelihood of finding migrants would be optimised. As part of this procedure, recent migrants (who had migrated in the preceding five years) received a higher score than lifetime migrants (who had not migrated during the preceding five years). Similarly, a higher score was attached to international immigrants (both recent and lifetime, who had come to SA from abroad) than to internal migrants (who had only moved within SA's borders). A greater weight also applied to inter-provincial (internal) than to intra-provincial migrants (who only moved within the same South African province).
How the three data sources were combined to provide overall scores for EA can be briefly described. First, in each of the two provinces, all local government units were given migration scores according to the numbers or relative proportions of the population classified in the various categories of migrants (with non-migrants given a score of 1.0. Migrants were assigned higher scores according to their priority, with international migrants given higher scores than internal migrants and recent migrants higher scores than lifetime migrants. Then within the local governments, sub-places were assigned scores assigned on the basis of inter vs. intra-provincial migrants using the 2001 census data. Each SP area in a local government was thus assigned a value which was the product of its local government score (the same for all SPs in the local government) and its own SP score. The third and final stage was to develop relative migration scores for all the EAs from the 1996 census by similarly weighting the proportions of migrants (and non-migrants, assigned always 1.0) of each type. The the final migration score for an EA is the product of its own EA score from 1996, the SP score of which it is a part (assigned to all the EAs within the SP), and the local government score from the 2007 survey.
Based on all the above principles the set of weights or scores was developed.
In sum, we multiplied the proportion of populations of each migrant type, or their incidence, by the appropriate final corresponding EA scores for persons of each type in the EA (based on multiplying the three weights together), to obtain the overall score for each EA. This takes into account the distribution of persons in the EA according to migration status in 1996, the SP score of the EA in 2001, and the local government score (in which the EA is located) from 2007. Finally, all EAs in each province were then classified into quartiles, prior to sampling from the quartiles.
From the EAs so classified, the sampling took the form of selecting EAs, i.e., primary sampling units (PSUs, which in this case are also Ultimate Sampling Units, since this is a single stage sample), according to their classification into quartiles. The proportions selected from each quartile are based on the range of EA-level scores which are assumed to reflect weighted probabilities of finding desired migrants in each EA. To enhance the likelihood of finding migrants, much higher proportions of EAs were selected into the sample from the quartiles with the higher scores compared to the lower scores (disproportionate sampling). The decision on the most appropriate categorisations was informed by the observed migration levels in the two provinces of the study area during 2007, 2001 and 1996, analysed at the lowest spatial level for which migration data was available in each case.
Because of the differences in their characteristics it was decided that the provinces of Gauteng and Limpopo should each be regarded as an explicit stratum for sampling purposes. These two provinces therefore represented the primary explicit strata. It was decided to select an equal number of EAs from these two primary strata.
The migration-level categories referred to above were treated as secondary explicit strata to ensure optimal coverage of each in the sample. The distribution of migration levels was then used to draw EAs in such a way that greater preference could be given to areas with higher proportions of migrants in general, but especially immigrants (note the relative scores assigned to each type of person above). The proportion of EAs selected into the sample from the quartiles draws upon the relative mean weighted migrant scores (referred to as proportions) found below the table, but this is a coincidence and not necessary, as any disproportionate sampling of EAs from the quartiles could be done, since it would be rectified in the weighting at the end for the analysis.
The resultant proportions of migrants then led to the following proportional allocation of sampled EAs (Quartile 1: 5 per cent (instead of 25% as in an equal distribution), Quartile 2: 15 per cent (instead
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Gender pay gap reporting is due to be introduced nationally for all employers from 2017. This shows a snapshot of the Council as at March 2016. All staff are included in the calculation for the mean and median hourly earnings. The quartile salary information shows the amount of men and women in each quartile. This is the range from the lowest paid employee to the highest paid employee split into 4 equal parts.
Facebook
TwitterThis study investigates the rate of erosion during the 1951-2006 period on the Bykovsky Peninsula, located north-east of the harbour town of Tiksi, north Siberia. Its coastline, which is characterized by the presence of ice-rich sediment (Ice Complex) and the vicinity of the Lena River Delta, retreated at a mean rate of 0.59 m/yr between 1951 and 2006. Total erosion ranged from 434 m of erosion to 92 m of accretion during these 56 years and exhibited large variability (sigma = 45.4). Ninety-seven percent of the rates observed were less than 2 m/yr and 81.6% were less than 1 m/yr. No significant trend in erosion could be recorded despite the study of five temporal subperiods within 1951-2006. Erosion modes and rates actually appear to be strongly dependant on the nature of the backshore material, erosion being stronger along low-lying coastal stretches affected by past or current thermokarst activity. The juxtaposition of wind records monitored at the town of Tiksi and erosion records yielded no significant relationship despite strong record amplitude for both data sets. We explain this poor relationship by the only rough incorporation of sea-ice cover in our storm extraction algorithm, the use of land-based wind records vs. offshore winds, the proximity of the peninsula to the Lena River Delta freshwater and sediment plume and the local topographical constraints on wave development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains three RData files each containing MCMC simulations from the posterior distributions of a Bayesian hierarchal model used to estimate European migration flows, developed as part of the project Quantifying Migration Scenarios for Better Policy (QuantMig, www.quantmig.eu); see Aristotelous, Smith and Bijak (2022) for details. The first set of estimates are disaggregated by origin, destination and time, a breakdown which we denote as ODT. The second and third sets of estimates are again disaggregated by origin, destination and time, but they are additionally disaggregated by other factors. The second set is further disaggregated by age and sex and the third by just birth region. We respectively denote these breakdowns as ODAST and ODBT. Also, included in this folder is an R script to calculate summaries of these posterior distributions (means, and lower and upper quartiles) and output them to csv files, which are also in the folder. For further details, see deliverable_6_4_v1_2.pdf also in the folder.
Facebook
TwitterThe household registration system known as ho khau has been a part of the fabric of life in Vietnam for over 50 years. The system was used as an instrument of public security, economic planning, and control of migration, at a time when the state played a stronger role in direct management of the economy and the life of its citizens. Although the system has become less rigid over time, concerns persist that ho khau limits the rights and access to public services of those who lack permanent registration in their place of residence. Due largely to data constraints, however, previous discussions about the system have relied largely on anecdotal or partial information.
Drawing from historical roots as well as the similar model of China’s hukou, the ho khau system was established in Vietnam in 1964. The 1964 law established the basic parameters of the system: every citizen was to be registered as a resident in one and only household at the place of permanent residence, and movements could take place only with the permission of authorities. Controlling migration to cities was part of the system’s early motivation, and the system’s ties to rationing, public services, and employment made it an effective check on unsanctioned migration. Transfer of one’s ho khau from one place to another was possible in principle but challenging in practice.
The force of the system has diminished since the launch of Doi Moi as well as a series of reforms starting in 2006. Most critically, it is no longer necessary to obtain permission from the local authorities in the place of departure to register in a new location. Additionally, obtaining temporary registration status in a new location is no longer difficult. However, in recent years the direction of policy changes regarding ho khau has been varied. A 2013 law explicitly recognized the authority of local authorities to set their own policies regarding registration, and some cities have tightened the requirements for obtaining permanent status.
Understanding of the system has been hampered by the fact that those without permanent registration have not appeared in most conventional sources of socioeconomic data. To gather data for this project, a survey of 5000 respondents in five provinces was done in June-July 2015. The samples are representative of the population in 5 provinces – Ho Chi Minh City, Ha Noi, Da Nang, Binh Duong and Dak Nong. Those five provinces/cities are among the provinces with the highest rate of migration as estimated using data from Population Census 2009.
5 provinces – Ho Chi Minh City, Ha Noi, Da Nang, Binh Duong and Dak Nong.
Household
Sample survey data [ssd]
Sampling for the Household Registration Survey was conducted in two stages. The two stages were selection of 250 enumeration areas (50 EAs in each of 5 provinces) and then selection of 20 households in each selected EA, resulting in a total sample size of 5000 households. The EAs were selected using Probability Proportional to Size (PPS) method based on the square number of migrants in each EA, with the aim to increase the probability of being selected for EAs with higher number of migrants. “Migrants” were defined using the census data as those who lived in a different province five years previous to the census. The 2009 Population Census data was used as the sample frame for the selection of EAs. To make sure the sampling frame was accurate and up to date, EA leaders of the sampled EAs were asked to collection information of all households regardless of registration status at their ward a month before the actual fieldwork. Information collected include name of head of household, address, gender, age of household’s head, household phone number, residence registration status of household, and place of their registration 5 years ago. All households on the resulting lists were found to have either temporary or permanent registration in their current place of residence.
Using these lists, selection of survey households was stratified at the EA level to ensure a substantial surveyed population of households without permanent registration. In each EA random selection was conducted of 12 households with temporary registration status and 8 households with permanent registration status. For EAs where the number of temporary registration households was less than 12, all of the temporary registration households were selected and additional permanent registration households were selected to ensure that each EA had 20 survey households. Sampling weights were calculated taking into the account the selection rules for the first and second stages of the survey.
Computer Assisted Personal Interview [capi]
The questionnaire was mostly adapted from the Vietnam Household Living Standard Survey (VHLSS), and the Urban Poverty Survey (UPS) with appropriate adjustment and supplement of a number of questions to follow closely the objectives of this survey. The household questionnaire consists of a set of questions on the following contents:
• Demographic characteristics of household members with emphasis on their residence status in terms of both administrative management (permanent/temporary residence book) and real residential situation. • Education of household members. Beside information on education level, the respondents are asked whether a household member attend school as “trai-tuyen” , how much “trai-tuyen” fee/enrolment fee, and difficulty in attending schools without permanent residence status. • Health and health care, collecting information on medical status and health insurance card of household members. • Labour and employment, asking household member’s employment status in the last 30 days; their most and second-most time-consuming employment during the last 30 days; and whether they had been asked about residence status when looking for job. • Assets and housing conditions. This section collects information on household’s living conditions such as assets, housing types and areas, electricity, water and energy. • Income and expenditure of households. • Social inclusion and protection. The respondents are asked whether their household members participate in social organizations, activities, services, contribution; whether they benefit from any social project/policy; do they have any loans within the last 12 months; and to provide information about five of their friends at their residential area. • Knowledge on the Law of Residence, current regulations on conditions for obtaining permanent residence, experience dealing with residence issues, and opinion on current household registration system of the respondents.
Managing and Cleaning the Data
Data were managed and cleaned each day immediately upon being received, which occurred at the same time as the fieldwork surveys. At the end of each workday, the survey teams were required to review all of the interviews conducted and transfer collected data to the server. The data received by the main server were downloaded and monitored by MDRI staff.
At this stage, MDRI assigned a technical team to work on the data. First, the team listened to interview records and used an application to detect enumerators’ errors. In this way, MDRI quickly identified and corrected the mistakes of the interviewers. Then the technical team proceeded with data cleaning by questionnaire, based on the following quantity and quality checking criteria.
• Quantity checking criteria: The number of questionnaires must be matched with the completed interviews and the questionnaires assigned to each individual in the field. According to the plan, each survey team conducted 20 household questionnaires in each village. All questionnaires were checked to ensure that they contained all essential information, and duplicated entries were eliminated. • Quality checking criteria: Our staff performed a thorough examination of the practicality and logic of the data. If there was any suspicious or inconsistent information, the data management team re – listened to the records or contacted the respondents and survey teams for clarification via phone call. Necessary revisions would then be made.
Data cleaning was implemented by the following stages: 1. Identification of illogical values; 2. Software – based detection of errors for clarification and revision; 3. Information re-checking with respondents and/or enumerators via phone or through looking at the records; 4. Development and implementation of errors correction algorithms; The list of detected and adjusted errors is attached in Annex 6.
Outlier detection methods The data team applied a popular non - parametric method for outlier detection, which can be done with the following procedure: 1. Identify the first quartile Q1 (the 25th percentile data point) 2. Identify the third quartile Q3 (the 75th percentile data point) 3. Identify the inter-quartile range(IQR): IQR=Q3-Q1 4. Calculate lower limits (L) and upper limits (U) by the following formulas: o L=Q1-1.5*IQR o U=Q3+1.5*IQR 5. Detect outliers by the rule: An observation is an outlier if it lies below the lower bound or beyond the upper bound (i.e. less than L or greater than U)
Data Structure The completed dataset for the “Household registration survey 2015” includes 9 files in STATA format (.dta): • hrs_maindata: Information on the households, including: assets, housing, income, expenditures, social inclusion and social protection issues, household registration procedures • hrs_muc1: Basic information on the
Facebook
TwitterAnnual descriptive price statistics for each calendar year 2005 – 2023 for 11 Local Government Districts in Northern Ireland. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
Facebook
TwitterBy Health Data New York [source]
This dataset contains New York State county-level data on obesity and diabetes related indicators from 2008 - 2012. It includes information about counties' population health status, such as the number of events, percentage/rate, 95% confidence interval, measured units and more. Analyzing this data provides insight into how communities across New York State are impacted by these diseases and how we can work together to create healthier living environments for everyone. This dataset is released under a Terms of Service license agreement – make sure to read through and understand the details if you plan to use it in any research or commercial application
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains county-level data on obesity and diabetes related indicators in New York State. As such, it can be used to research indicators related to general health in various counties of the state.
To use this dataset effectively, first become familiar with the columns included and their meanings: - County Name: The name of the county. (String) - County Code: The code of the county. (Integer) - Region Name: The name of the region. (String) - Indicator Number: The number of the indicator. (Integer) - Total Event Counts: The total number of events related to the indicator.(Integer)
- Denominator: The denominator used to calculate the percentage/rate.(Integer) - Denominator Note: Any additional notes related to the denominator.(String) - Measure Unit :The unit of measure used for this rate/percentage .(String). - Percentage/Rate :The percentage/rate calculated using denominator and observed count data .(Float). - 95% CI :The 95% confidence interval associated with any defined rate or percentage.(Float). - Data Comments :Any additional comments relevant to this data source or indicator .(String ). - Data Years :Years covered by this particular indicator observation .(String ). - Data Sources :Sources from which we have drawn our data for indicators involving counties from different regions .(Strings). - Quartile :Quartiles are derived when all geographic entities are ranked according to a specific metric score ,and are then cut into quartiles based on speed score =0= bottom quarter; =1= middle two quarters combined; =2= top quarter..(Integer). - Mapping Distribution ;A visual representation that includes mapping details regarding how Indicators relating either disease rates or characteristics are positioned across States, regions and counties as well as any trends plus other pertinent mapping information ,such as health resource availability.(In pair plot form form otherwise text will present an informational string.). Location ;Area where distribution around space occurs..e point feature with a single location ID retrieved from geoplanet proxy service.. (string ).Using these columns, you can find out demographic information about your chosen county such as obesity rate and diabetes incidence etc., enabling you better understand its health situation overall. Additionally,this dataset also provides important comparison features such as quartiles rankings
Analysing the geographic distribution of obesity and diabetes related indicators by county in New York State, in order to identify areas which may require greater levels of intervention and preventative health measures.
Evaluating trends over time for different counties to assess whether policies or programs have had an impact on indicators relating to obesity and diabetes within the given area.
Using machine learning techniques such as clustering analysis or predictive modelling, to identify patterns within the data which can be used to better inform preventative health interventions across New York State
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: community-health-obesity-and-diabetes-related-indicators-2008-2012-1.csv | Column name | Description | |:-------------------------|:-----------------------------------------------------------------------------------------| | **Count...
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Affordability ratios calculated by dividing house prices by gross annual residence-based earnings. Based on the median and lower quartiles of both house prices and earnings in England and Wales.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Covariates comprising
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2024 for 462 electoral wards within 11 Local Government Districts. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.