Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population density per pixel at 100 metre resolution. WorldPop provides estimates of numbers of people residing in each 100x100m grid cell for every low and middle income country. Through ingegrating cencus, survey, satellite and GIS datasets in a flexible machine-learning framework, high resolution maps of population counts and densities for 2000-2020 are produced, along with accompanying metadata. DATASET: Alpha version 2010 and 2015 estimates of numbers of people per grid square, with national totals adjusted to match UN population division estimates (http://esa.un.org/wpp/) and remaining unadjusted. REGION: Africa SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the equator) PROJECTION: Geographic, WGS84 UNITS: Estimated persons per grid square MAPPING APPROACH: Land cover based, as described in: Linard, C., Gilbert, M., Snow, R.W., Noor, A.M. and Tatem, A.J., 2012, Population distribution, settlement patterns and accessibility across Africa in 2010, PLoS ONE, 7(2): e31743. FORMAT: Geotiff (zipped using 7-zip (open access tool): www.7-zip.org) FILENAMES: Example - AGO10adjv4.tif = Angola (AGO) population count map for 2010 (10) adjusted to match UN national estimates (adj), version 4 (v4). Population maps are updated to new versions when improved census or other input data become available.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The refugee location data (Geo-Refugee) provides information on the geographical locations, population sizes and accommodation types of refugees and people in refugee-like situations throughout Africa. Based on the United Nations High Commissioner for Refugees' Location and Demographic Composition data as well as information contained in supplemental UNHCR resources, Geo-Refugee assigns administrative unit names and geographic coordinates to refugee camps/ centers, and locations hosting dispersed (self-settled) refugees. Geo-Refugee was collected for the purpose of investigating the relationship between refugees and armed conflict, but can be used for a number of refugee-related studies. The original data for the category refugees and people in a refugee-like situation by accommodation type and location name comes directly from the UNHCR. The category refugees includes: "individuals recognized under the 1951 Convention relating to the Status of Refugees and its 1967 Protocol; the 1969 OAU Convention Governing the Specific Aspects of Refugee Problems in Africa; those recognized in accordance with the UNHCR statute; individuals granted complementary forms of protection and those enjoying temporary protection.The category people in a refugee-like situation "is descriptive in nature and includes groups of people who are outside their country of origin and who face protection risks similar to those of refugees, but for whom refugee status has, for practical or other reasons, not been ascertained" (UNHCR http://www.unhcr.org/45c06c662.html). The unit of the data is the first-level administrative unit (province, region or state). A refugee location is defined as a unit with a known refugee population, as established by UNHCR country offices. The locations data was compiled using statistics provided by the UNHCR Division of Programme Support and Management. Several of the refugee sites in the original UNHCR data are camp names or other lo cations which are not immediately traceable to a particular location using even the most established geographical databases like that of the National Geospatial Intelligence Agency (NGA). Thus, unit-level location of refugees was established and confirmed using supplementary resources including reports, maps, and policy documents compiled by the UNHCR and contained in the Refworld database (see http://www.unhcr.org/cgi-bin/texis/vtx/refworld/rwmain). Refworld was the primary database used for this project. Geographic coordinates were assigned using the database of the National Geospatial-Intelligence Agency. See https://www1.nga.mil/Pages/default.aspx for more information. All attempts were made to find precise coordinates, including cross-referencing with Google Maps. The current version of the data covers 43 African countries and encompasses the period 2000 to 2010. The UNHCR began systematically collecting information on the locations and demographic compositions of refugee populations in 2000.
The Afrobarometer project assesses attitudes and public opinion on democracy, markets, and civil society in several sub-Saharan African.This dataset was compiled from the studies in Round 2 of the Afrobarometer, conducted from 2002-2004 in 16 countries, including Botswana, Cape Verde, Ghana, Kenya, Lesotho, Malawi, Mali, Mozambique, Namibia, Nigeria, Senegal, South Africa, Tanzania, Uganda, Zambia, and Zimbabwe
The Round 2 Afrobarometer surveys have national coverage for the following countries: Botswana, Ghana, Kenya, Lesotho, Malawi, Mali, Mozambique, Namibia, Nigeria, Republic of Cabo Verde, Senegal, South Africa, Tanzania, Uganda, Zambia, Zimbabwe.
Individuals
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample survey data [ssd]
Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. They achieve this by:
• using random selection methods at every stage of sampling; • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.
The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalized settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.
Sample size and design Samples usually include either 1,200 or 2,400 cases. A randomly selected sample of n=1200 cases allows inferences to national adult populations with a margin of sampling error of no more than +/-2.8% with a confidence level of 95 percent. With a sample size of n=2400, the margin of error decreases to +/-2.0% at 95 percent confidence level.
The sample design is a clustered, stratified, multi-stage, area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location.
Area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analysed. Any oversamples is noted in the TIR.
Sample stages Samples are drawn in either four or five stages:
Stage 1: In rural areas only, the first stage is to draw secondary sampling units (SSUs). SSUs are not used in urban areas, and in some countries they are not used in rural areas. See the TIR that accompanies each data set for specific details on the sample in any given country. Stage 2: We randomly select primary sampling units (PSU). Stage 3: We then randomly select sampling start points. Stage 4: Interviewers then randomly select households. Stage 5: Within the household, the interviewer randomly selects an individual respondent. Each interviewer alternates in each household between interviewing a man and interviewing a woman to ensure gender balance in the sample.
To keep the costs and logistics of fieldwork within manageable limits, eight interviews are clustered within each selected PSU.
Data weights For some national surveys, data are weighted to correct for over or under-sampling or for household size. "Withinwt" should be turned on for all national -level descriptive statistics in countries that contain this weighting variable. It is included as the last variable in the data set, with details described in the codebook. For merged data sets, "Combinwt" should be turned on for cross-national comparisons of descriptive statistics. Note: this weighting variable standardizes each national sample as if it were equal in size.
Further information on sampling protocols, including full details of the methodologies used for each stage of sample selection, can be found at https://afrobarometer.org/surveys-and-methods/sampling-principles
Face-to-face [f2f]
Certain questions in the questionnaires for the Afrobarometer 2 survey addressed country-specific issues, but many of the same questions were asked across surveys. Citizens of the 16 countries were asked questions about their economic and social situations, and their opinions were elicited on recent political and economic changes within their country.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the African Human Facial Images Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.
This dataset comprises over 2,000 African individual facial image sets, with each set including:
The dataset includes contributions from a diverse network of individuals across African countries.
To ensure high utility and robustness, all images are captured under varying conditions:
Each facial image set is accompanied by detailed metadata for each participant, including:
This metadata is essential for training models that can accurately recognize and identify faces across different demographics and conditions.
This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:
We understand the evolving nature of AI and machine learning requirements. Therefore, we continuously add more assets with diverse conditions to this off-the-shelf facial image dataset.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the African Facial Images from Past Dataset, meticulously curated to enhance face recognition models and support the development of advanced biometric identification systems, KYC models, and other facial recognition technologies.
This dataset comprises over 10,000+ images, divided into participant-wise sets with each set including:
The dataset includes contributions from a diverse network of individuals across African countries:
To ensure high utility and robustness, all images are captured under varying conditions:
Each image set is accompanied by detailed metadata for each participant, including:
This metadata is essential for training models that can accurately recognize and identify African faces across different demographics and conditions.
This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:
The health and demography of the South African population has been undergoing substantial changes as a result of the rapidly progressing HIV epidemic. Researchers at the University of KwaZulu-Natal and the South African Medical Research Council established The Africa Health Research Studies in 1997 funded by a core grant from The Wellcome Trust, UK. Given the urgent need for high quality longitudinal data with which to monitor these changes, and with which to evaluate interventions to mitigate impact, a demographic surveillance system (DSS) was established in a rural South African population facing a rapid and severe HIV epidemic. The DSS, referred to as the Africa Health Research Institute Demographic Information System (ACDIS), started in 2000.
ACDIS was established to ‘describe the demographic, social and health impact of the HIV epidemic in a population going through the health transition’ and to monitor the impact of intervention strategies on the epidemic. South Africa’s political and economic history has resulted in highly mobile urban and rural populations, coupled with complex, fluid households. In order to successfully monitor the epidemic, it was necessary to collect longitudinal demographic data (e.g. mortality, fertility, migration) on the population and to mirror this complex social reality within the design of the demographic information system. To this end, three primary subjects are observed longitudinally in ACDIS: physical structures (e.g. homesteads, clinics and schools), households and individuals. The information about these subjects, and all related information, is stored in a single MSSQL Server database, in a truly longitudinal way—i.e. not as a series of cross-sections.
The surveillance area is located near the market town of Mtubatuba in the Umkanyakude district of KwaZulu-Natal. The area is 438 square kilometers in size and includes a population of approximately 85 000 people who are members of approximately 11 000 households. The population is almost exclusively Zulu-speaking. The area is typical of many rural areas of South Africa in that while predominantly rural, it contains an urban township and informal peri-urban settlements. The area is characterized by large variations in population densities (20–3000 people/km2). In the rural areas, homesteads are scattered rather than grouped. Most households are multi-generational and range with an average size of 7.9 (SD:4.7) members. Despite being a predominantly rural area, the principle source of income for most households is waged employment and state pensions rather than agriculture. In 2006, approximately 77% of households in the surveillance area had access to piped water and toilet facilities.
To fulfil the eligibility criteria for the ACDIS cohort, individuals must be a member of a household within the surveillance area but not necessarily resident within it. Crucially, this means that ACDIS collects information on resident and non-resident members of households and makes a distinction between membership (self-defined on the basis of links to other household members) and residency (residing at a physical structure within the surveillance area at a particular point in time). Individuals can be members of more than one household at any point in time (e.g. polygamously married men whose wives maintain separate households). As of June 2006, there were 85 855 people under surveillance of whom 33% were not resident within the surveillance area. Obtaining information on non-resident members is vital for a number of reasons. Most importantly, understanding patterns of HIV transmission within rural areas requires knowledge about patterns of circulation and about sexual contacts between residents and their non-resident partners. To be consistent with similar datasets from other INDEPTH Member centres, this data set contains data from resident members only.
During data collection, households are visited by fieldworkers and information supplied by a single key informant. All births, deaths and migrations of household members are recorded. If household members have moved internally within the surveillance area, such moves are reconciled and the internal migrant retains the original identfier associated with him/her.
Demographic surveillance area situated in the south-east portion of the uMkhanyakude district of KwaZulu-Natal province near the town of Mtubatuba. It is bounded on the west by the Umfolozi-Hluhluwe nature reserve, on the South by the Umfolozi river, on the East by the N2 highway (except form portions where the Kwamsane township strandles the highway) and in the North by the Inyalazi river for portions of the boundary. The area is 438 square kilometers.
Individual
Resident household members of households resident within the demographic surveillance area. Inmigrants are defined by intention to become resident, but actual residence episodes of less than 180 days are censored. Outmigrants are defined by intention to become resident elsewhere, but actual periods of non-residence less than 180 days are censored. Children born to resident women are considered resident by default, irrespective of actual place of birth. The dataset contains the events of all individuals ever resident during the study period (1 Jan 2000 to 31 Dec 2015).
Event history data
This dataset contains rounds 1 to 37 of demographic surveillance data covering the period from 1 Jan 2000 to 31 December 2015. Two rounds of data collection took place annually except in 2002 when three surveillance rounds were conducted. From 1 Jan 2015 onwards there are three surveillance rounds per annum.
This dataset is not based on a sample but contains information from the complete demographic surveillance area.
Reponse units (households) by year:
Year Households
2000 11856
2001 12321
2002 12981
2003 12165
2004 11841
2005 11312
2006 12065
2007 12165
2008 11790
2009 12145
2010 12485
2011 12455
2012 12087
2013 11988
2014 11778
2015 11938
In 2006 the number of response units increased due to the addition of a new village into the demographic surveillance area.
None
Proxy Respondent [proxy]
Bounded structure registration (BSR) or update (BSU) form: - Used to register characteristics of the BS - Updates characteristics of the BS - Information as at previous round is preprinted
Household registration (HHR) or update (HHU) form: - Used to register characteristics of the HH - Used to update information about the composition of the household - Information preprinted of composition and all registered households as at previous
Household Membership Registration (HMR) or update (HMU): - Used to link individuals to households - Used to update information about the household memberships and member status observations - Information preprinted of member status observations as at previous
Individual registration form (IDR): - Used to uniquely identify each individual - Mainly to ensure members with multiple household memberships are appropriately captured
Migration notification form (MGN): - Used to record change in the BS of residency of individuals or households _ Migrants are tracked and updated in the database
Pregnancy history form (PGH) & pregnancy outcome notification form (PON): - Records details of pregnancies and their outcomes - Only if woman is a new member - Only if woman has never completed WHL or WGH
Death notification form (DTN): - Records all deaths that have recently occurred - Iincludes information about time, place, circumstances and possible cause of death
On data entry data consistency and plausibility were checked by 455 data validation rules at database level. If data validaton failure was due to a data collection error, the questionnaire was referred back to the field for revisit and correction. If the error was due to data inconsistencies that could not be directly traced to a data collection error, the record was referred to the data quality team under the supervision of the senior database scientist. This could request further field level investigation by a team of trackers or could correct the inconsistency directly at database level.
No imputations were done on the resulting micro data set, except for:
a. If an out-migration (OMG) event is followed by a homestead entry event (ENT) and the gap between OMG event and ENT event is greater than 180 days, the ENT event was changed to an in-migration event (IMG). b. If an out-migration (OMG) event is followed by a homestead entry event (ENT) and the gap between OMG event and ENT event is less than 180 days, the OMG event was changed to an homestead exit event (EXT) and the ENT event date changed to the day following the original OMG event. c. If a homestead exit event (EXT) is followed by an in-migration event (IMG) and the gap between the EXT event and the IMG event is greater than 180 days, the EXT event was changed to an out-migration event (OMG). d. If a homestead exit event (EXT) is followed by an in-migration event (IMG) and the gap between the EXT event and the IMG event is less than 180 days, the IMG event was changed to an homestead entry event (ENT) with a date equal to the day following the EXT event. e. If the last recorded event for an individual is homestead exit (EXT) and this event is more than 180 days prior to the end of the surveillance period, then the EXT event is changed to an
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Swahili is spoken by 100-150 million people across East Africa. In Tanzania, it is one of two national languages (the other is English) and it is the official language of instruction in all schools. News in Swahili is an important part of the media sphere in Tanzania.
News contributes to education, technology, and the economic growth of a country, and news in local languages plays an important cultural role in many Africa countries. In the modern age, African languages in news and other spheres are at risk of being lost as English becomes the dominant language in online spaces.
The Swahili news dataset was created to reduce the gap of using the Swahili language to create NLP technologies and help AI practitioners in Tanzania and across the Africa continent to practice their NLP skills to solve different problems in organizations or societies related to the Swahili language. Swahili News were collected from different websites that provide news in the Swahili language. I was able to find some websites that provide news in Swahili only and others in different languages including Swahili.
The dataset was created for a specific task of text classification, this means each news content can be categorized into six different topics (Local News, International News, Finance News, Health News, Sports News, and Entertainment news). The dataset comes with a specified train/test split. The train set contains 75% of the dataset.
Acknowledgment: This project was supported by the AI4D language dataset fellowship through K4All and Zindi Africa.
The ‘South African Population Research Infrastructure Network’ (SAPRIN) is a national research infrastructure funded through the Department of Science and Innovation and hosted by the South African Medical Research Council. One of SAPRIN’s initial goals has been to harmonise and share the longitudinal data from the three current Health and Demographic Surveillance System (HDSS) Nodes. These long-standing nodes are the MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, established in 1993, with a population of 113 113 people; the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, established in 1996, with a current population of 38 479; and the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, established in 2000, with a current population of 139 250.
This dataset represents a snapshot of the continually evolving data in the underlying longitudinal databases maintained by the SAPRIN nodes. In these databases the rightmost extend of the individual's surveillance episode is indicated by the data collection date of the last time the individual's membership of a household under surveillance has been confirmed. Each dataset has a right censor date (31 December 2017 for the current version of the dataset) and individual surveillance episodes are terminated at that point if the individual is still under surveillance beyond the cut-off date.
Each individual surveillance episode is associated with a physical location, for internal residency episodes it is the actual place of residence of the individual, for external residence episodes (periods of temporary migration) it is the place of residence of the individual's household. If an individual change their place of residency from one location within the surveillance area to another location still within the surveillance area, the episode at the original location is terminated with a location exit event, and a new episode starts with a location entry event at the destination location. It is also possible for the household the individual is a member of, to change their place of residency in the surveillance area, whilst the individual is externally resident (is a temporary migrant), in which case the individual's external resident episode will also be split with a location exit-entry pair of events.
At every household visit written consent is obtained from the household respondent for continued participation in the surveillance and such consent can be withdrawn. When this happens all household members' surveillance episodes are terminated with a refusal event. It is possible for households to again provide consent to participate in the surveillance after some time, in such cases surveillance events are restarted with a permission event.
As mentioned previously, surveillance episodes are continually extended by the last data collection event if the individual remains under surveillance. In certain cases, individuals may be lost to follow-up and surveillance episodes where the date of last data collection is more than one year prior to the right censor data are terminated as lost to follow up at that last data collection date. Individuals with data collection dates within a year of the right censor date is considered still to be under surveillance up to this last data collection date.
Each surveillance episode contains the identifier of the household the individual is a member of during that episode. Under relatively rare circumstances it is possible for an individual to change household membership whilst still resident at the same location, or to change membership whilst externally resident, in these cases the surveillance episode will be split with a pair of membership end and membership start events. More commonly membership start and end events coincide with location exit and entry events or in- and out-migration events. Memberships also obviously start at birth or enumeration and end at death, refusal to participate or lost to follow-up.
In about half of the cases, individuals have a single episode from first enumeration, birth or in-migration, to their eventual death, out-migration or currently still under surveillance. In the remaining cases, individuals transition from internal residency to external residency via out-migration, or from one location to another via internal migration with a location exit and entry event, or some other rarer form of transition involving membership change, refusal or lost to follow-up. Usually these series of surveillance episodes are continuous in time, with no gaps between episodes, but gaps can form, e.g. when an individual out-migrates and end membership with the household and so is no longer under surveillance, only to return via in-migration at some future date and take up membership with same or different household.
The SAPRIN Individual Surveillance Episodes 2020 Datasets consists of three types of the Demographic surveillance datasets: 1.SAPRIN Individual Surveillance Episodes 2020: Basic Dataset. This dataset contains only the internal and external residency episodes for an individual. 2.SAPRIN Individual Surveillance Episodes 2020: Age-Year-Delivery Dataset. This dataset splits the basic surveillance episodes at calendar year end and at the date when the age in years (birth-day) of an individual changes. In the case of women who have given births, episodes are split at the time of delivery as well. 3.SAPRIN Individual Surveillance Episodes 2020: Detailed Dataset. This dataset adds to the dataset 2 time-varying attributes such as education, employment, marital status and socio-economic status.
The South African Population Research Infrastructure Network (SAPRIN) currently represents a network of three Health and Demographic Surveillance System (HDSS) nodes located in rural South Africa, namely: 1) MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, which has collected data since 1993. The nodal website is: http://www.agincourt.co.za; 2) the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, which has collected data since 1996.The nodal website is: N/A; 3) and the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, which has collected data since 2000.The nodal website is: http://www.ahri.org.
The Agincourt HDSS covers a surveillance area of approximately 420 square kilometres and is located in the Bushbuckridge District, Mpumalanga in the rural northeast of South Africa close to the Mozambique border. At baseline in 1992, 57 600 people were recorded in 8900 households in 20 villages; by 2006, the population had increased to about 70 000 people in 11 700 households. As of December 2017, there were 113 113 people under surveillance of whom 28% were not resident within the surveillance area, with a total of about 2m person years of observation. 33% of the population is under 15 years old. The population is almost exclusively Shangaan-speaking.The Agincourt HDSS has population density of over 200 persons per square kilometre. The Agincourt HDSS extends between latitudes 24° 50´ and 24° 56´S and longitudes 31°08´ and 31°´ 25´ E. The altitude is about 400-600m above sea level.
DIMAMO is located in the Capricorn district, Limpopo Province approximately 40 kilometres from Polokwane, the capital city of Limpopo Province and 15-50 kilometres from the University of Limpopo. The site covers an area of approximately 400 square kilometres . The initial total population observed was about 8 000 but the field site was expanded in 2010. As of December 2017, there were 38 479 people under surveillance, of whom 22% were not resident within the surveillance area, with about 400,000 person years of observation. 30% of the population is under 15 years old. The population is predominantly Sotho speaking. Most households have electricity. Some households have piped water either inside the house or in their yards, but most fetch water from taps situated at strategic points in the villages. Most households have a pit latrine in their yards. The area lies between latitudes and 23°65´ and 23°90´S and longitudes 29°65´ and 29°85´E. The HDSS is located on a high plateau area (approximately 1250 m above sea level) where communities typically consist of households clustered in villages, with access to local land for small-scale food production.
Africa Health Research Institute (AHRI) is situated in the south-east portion of the Umkhanyakude district of KwaZulu-Natal province near the town of Mtubatuba. It is bounded on the west by the Umfolozi-Hluhluwe nature reserve, on the south by the Umfolozi river, on the east by the N2 highway (except form portions where the Kwamsane township stradles the highway) and in the north by the Inyalazi river for portions of the boundary. The surveillance area is approximately 850 square kilometres. As of December 2017, there were 139 250 people under surveillance of whom 28% were not resident within the surveillance area, with about 1.7m person years of observation. 32% of the population is under 15 years old. The population is almost exclusively Zulu-speaking. The surveillance area is typical of many rural areas of South Africa in that while predominantly rural, it contains an urban township and informal peri-urban settlements. The area is characterized by large variations in population densities (20-3000 people per square kilometre). The area lies between latitudes -28°24' and 28°20'N and longitudes 32°10' and 31°58'E.
Households and individuals
Households resident in dwellings within the study area will be eligible for inclusion in the household component of SAPRIN. All individuals identified by the household proxy informant as a member of
The Afrobarometer is a comparative series of public attitude surveys that assess African citizen's attitudes to democracy and governance, markets, and civil society, among other topics. The surveys have been undertaken at periodic intervals since 1999. The Afrobarometer's coverage has increased over time. Round 1 (1999-2001) initially covered 7 countires and was later extended to 12 countries. Round 2 (2002-2004) surveyed citizens in 16 countries. Round 3 (2005-2006) 18 countries, and Round 4 (2008) 20 countries.The survey covered 34 countries in Round 5 (2011-2013), 36 countries in Round 6 (2014-2015), and 34 countries in Round 7 (2016-2018). Round 8 covered 34 African countries. The 34 countries covered in Round 8 (2019-2021) are:
Angola, Benin, Botswana, Burkina Faso, Cabo Verde, Cameroon, Côte d'Ivoire, eSwatini, Ethiopia, Gabon, Gambia, Ghana, Guinea, Kenya, Lesotho, Liberia, Malawi, Mali, Mauritius, Morocco, Mozambique, Namibia, Niger, Nigeria, Senegal, Sierra Leone, South Africa, Sudan, Tanzania, Togo, Tunisia, Uganda, Zambia and Zimbabwe.
The survey has national coverage in the following 34 African countries: Angola, Benin, Botswana, Burkina Faso, Cabo Verde, Cameroon, Côte d'Ivoire, eSwatini, Ethiopia, Gabon, Gambia, Ghana, Guinea, Kenya, Lesotho, Liberia, Malawi, Mali, Mauritius, Morocco, Mozambique, Namibia, Niger, Nigeria, Senegal, Sierra Leone, South Africa, Sudan, Tanzania, Togo, Tunisia, Uganda, Zambia and Zimbabwe.
Households and individuals
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
Sample survey data
Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. They achieve this by:
• using random selection methods at every stage of sampling; • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.
The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalised settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.
Sample size and design Samples usually include either 1,200 or 2,400 cases. A randomly selected sample of n=1200 cases allows inferences to national adult populations with a margin of sampling error of no more than +/-2.8% with a confidence level of 95 percent. With a sample size of n=2400, the margin of error decreases to +/-2.0% at 95 percent confidence level.
The sample design is a clustered, stratified, multi-stage, area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location.
Area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analysed. Any oversamples is noted in the TIR.
Sample stages Samples are drawn in either four or five stages:
Stage 1: In rural areas only, the first stage is to draw secondary sampling units (SSUs). SSUs are not used in urban areas, and in some countries they are not used in rural areas. See the TIR that accompanies each data set for specific details on the sample in any given country. Stage 2: We randomly select primary sampling units (PSU). Stage 3: We then randomly select sampling start points. Stage 4: Interviewers then randomly select households. Stage 5: Within the household, the interviewer randomly selects an individual respondent. Each interviewers alternates in each household between interviewing a man and interviewing a woman to ensure gender balance in the sample.
To keep the costs and logistics of fieldwork within manageable limits, eight interviews are clustered within each selected PSU.
Data weights For some national surveys, data are weighted to correct for over or under-sampling or for household size. "Withinwt" should be turned on for all national -level descriptive statistics in countries that contain this weighting variable. It is included as the last variable in the data set, with details described in the codebook. For merged data sets, "Combinwt" should be turned on for cross-national comparisons of descriptive statistics. Note: this weighting variable standardizes each national sample as if it were equal in size.
Further information on sampling protocols, including full details of the methodologies used for each stage of sample selection, can be found in Section 5 of the Afrobarometer Round 5 Survey Manual
Face-to-face
The questionnaire for Round 3 addressed country-specific issues, but many of the same questions were asked across surveys. The survey instruments were not standardized across all countries and the following features should be noted:
• In the seven countries that originally formed the Southern Africa Barometer (SAB) - Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe - a standardized questionnaire was used, so question wording and response categories are the generally the same for all of these countries. The questionnaires in Mali and Tanzania were also essentially identical (in the original English version). Ghana, Uganda and Nigeria each had distinct questionnaires.
• This merged dataset combines, into a single variable, responses from across these different countries where either identical or very similar questions were used, or where conceptually equivalent questions can be found in at least nine of the different countries. For each variable, the exact question text from each of the countries or groups of countries ("SAB" refers to the Southern Africa Barometer countries) is listed.
• Response options also varied on some questions, and where applicable, these differences are also noted.
The Afrobarometer is a comparative series of public attitude surveys that assess African citizen's attitudes to democracy and governance, markets, and civil society, among other topics.
The 12 country datasetis a combined dataset for the 12 African countries surveyed during round 1 of the survey, conducted between 1999-2000 (Botswana, Ghana, Lesotho, Mali, Malawi, Namibia, Nigeria South Africa, Tanzania, Uganda, Zambia and Zimbabwe), plus data from the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The Round 1 Afrobarometer surveys have national coverage for the following countries: Botswana, Ghana, Lesotho, Malawi, Mali, Namibia, Nigeria, South Africa, Tanzania, Uganda, Zambia, Zimbabwe.
Individuals
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample survey data [ssd]
Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. They achieve this by:
• using random selection methods at every stage of sampling; • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.
The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalized settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.
Sample size and design Samples usually include either 1,200 or 2,400 cases. A randomly selected sample of n=1200 cases allows inferences to national adult populations with a margin of sampling error of no more than +/-2.8% with a confidence level of 95 percent. With a sample size of n=2400, the margin of error decreases to +/-2.0% at 95 percent confidence level.
The sample design is a clustered, stratified, multi-stage, area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location.
Area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analysed. Any oversamples is noted in the TIR.
Sample stages Samples are drawn in either four or five stages:
Stage 1: In rural areas only, the first stage is to draw secondary sampling units (SSUs). SSUs are not used in urban areas, and in some countries they are not used in rural areas. See the TIR that accompanies each data set for specific details on the sample in any given country. Stage 2: We randomly select primary sampling units (PSU). Stage 3: We then randomly select sampling start points. Stage 4: Interviewers then randomly select households. Stage 5: Within the household, the interviewer randomly selects an individual respondent. Each interviewer alternates in each household between interviewing a man and interviewing a woman to ensure gender balance in the sample.
To keep the costs and logistics of fieldwork within manageable limits, eight interviews are clustered within each selected PSU.
Data weights For some national surveys, data are weighted to correct for over or under-sampling or for household size. "Withinwt" should be turned on for all national -level descriptive statistics in countries that contain this weighting variable. It is included as the last variable in the data set, with details described in the codebook. For merged data sets, "Combinwt" should be turned on for cross-national comparisons of descriptive statistics. Note: this weighting variable standardizes each national sample as if it were equal in size.
Further information on sampling protocols, including full details of the methodologies used for each stage of sample selection, can be found at https://afrobarometer.org/surveys-and-methods/sampling-principles
Face-to-face [f2f]
Because Afrobarometer Round 1 emerged out of several different survey research efforts, survey instruments were not standardized across all countries, there are a number of features of the questionnaires that should be noted, as follows: • In most cases, the data set only includes those questions/variables that were asked in nine or more countries. Complete Round 1 data sets for each individual country have already been released, and are available from ICPSR or from the Afrobarometer website at www.afrobarometer.org. • In the seven countries that originally formed the Southern Africa Barometer (SAB) - Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe - a standardized questionnaire was used, so question wording and response categories are the generally the same for all of these countries. The questionnaires in Mali and Tanzania were also essentially identical (in the original English version). Ghana, Uganda and Nigeria each had distinct questionnaires. • This merged dataset combines, into a single variable, responses from across these different countries where either identical or very similar questions were used, or where conceptually equivalent questions can be found in at least nine of the different countries. For each variable, the exact question text from each of the countries or groups of countries ("SAB" refers to the Southern Africa Barometer countries) is listed. • Response options also varied on some questions, and where applicable, these differences are also noted.
This data shows the number of people who have access with water in rural areas. The coverage of this data is Tanzania mainland
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This Swahili News Classification Dataset offers critical insights into media streams across East Africa, allowing for tailored insights related to racial tensions and social shifts. By utilizing the columns of text, label and content, this dataset allows researchers and data scientists to track classified news content from different countries in the region. From political unrest to gender-based violence, this dataset offers a comprehensive portrait of the various news stories from East African nations with practical applications for understanding how culture shapes press reporting and how media outlets portray world events. Alongside direct text information about individual stories, it is important that we study classifications like category and label in order to draw important conclusions about our society; by addressing these research questions with precise categorizations at hand we can ensure alignment between collected data points while also recognizing the unique nuances that characterize each country's media stream. This comprehensive dataset is essential for any project related to understanding communication processes between societies or tracking information flows within an interconnected global system
More Datasets For more datasets, click here.
Featured Notebooks 🚨 Your notebook can be here! 🚨! How to use the dataset This dataset is perfect for anyone looking to build a machine learning model to classify news content across East Africa. With this dataset, you can create a classifier that can automatically identify and categorize news stories into topics such as politics, economics, health, sports, environment and entertainment. This dataset contains labeled text data for training a model to learn how to classify the content of news articles written in Swahili.
Step 1: Understand the Dataset The first step towards building your classifier is getting familiar with the dataset provided. The list below outlines each column in the dataset:
text: The text of the news article
label: The category or topic assigned to the article
content: The text content of the news article
category: The category or topic assigned to the article
This dataset contains all you need for creating your classification model— pre-labeled articles with topics assigned by human annotators. Additionally, there are no date values associated with any of these columns listed. All articles have been labeled already so we won’t need those when creating our classifier!
We also need information about what languages are used in this context– good thing we’re working on classifying Swahili texts! After understanding more about which language these texts use we can move on towards selecting an appropriate algorithm for our task at hand – i.e., applying supervised machine learning algorithms that leverage both labeled and unlabeled data sets within this circumstances such as Language Modeling and Text Classification models like Naive Bayes Classifiers (NBCs), Maximum Entropy (MaxEnt) models among other traditional ML Models too but they most probably won’t be up enough robustness & accuracy merely when predicting unseen texts correctly; deep learning techniques often known as multi-layer perceptron (MLPs) may boost out best reporting performance results as desired from expected predictions from our trained/tested set yet since it sounds kinda costly computation complexity wise regarding its many layers involved nature than just classic linear sequence network ones — something could easily cover most cases am sure– however this tutorial does not focus precisely upon such topics since its part will take us way beyond current bounds so just keep moving along! ^^
Step 2 Preprocess Text Data Once you understand what each column represents we can start preparing our data by preprocessing it so that it is ready to be used by any algorithm chosen
Research Ideas Predicting trend topics of news coverage across East Africa by identifying news categories with the highest frequency of occurrences over given time periods. Identifying and flagging potential bias in news coverage across East Africa by analyzing the prevalence of certain labels or topics to discover potential trends in reporting style. Developing a predictive model to determine which topic or category will have higher visibility based on the amount of related content that is published in each region around East Africa
Columns File: train_v0.2.csv
Column name Description text The full article content of each news item. (String) label Labels that define what subject matter each article covers. (String) File: train.csv
Column name Description content The full article content of each news item. (Text) category Labels that define what subject matter each article covers. (Categorical)
CC0
Original Data Source: East African News Classification
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The High Resolution Settlement Layer (HRSL) provides estimates of human population distribution at a resolution of 1 arc-second (approximately 30m) for the year 2015. The population estimates are based on recent census data and high-resolution (0.5m) satellite imagery from DigitalGlobe. The population grids provide detailed delineation of settlements in both urban and rural areas, which is useful for many research areas—from disaster response and humanitarian planning to the development of communications infrastructure. The settlement extent data were developed by the Connectivity Lab at Facebook using computer vision techniques to classify blocks of optical satellite data as settled (containing buildings) or not. Center for International Earth Science Information Networks (CIESIN) at Earth Institute Columbia University used proportional allocation to distribute population data from subnational census data to the settlement extents. The data-sets contain the population surfaces, metadata, and data quality layers. The population data surfaces are stored as GeoTIFF files for use in remote sensing or geographic information system (GIS) software. The data can also be explored via an interactive map - http://columbia.maps.arcgis.com/apps/View/index.html?appid=ce441db6aa54494cbc6c6cee11b95917 Citation: Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. High Resolution Settlement Layer (HRSL). Source imagery for HRSL © 2016 DigitalGlobe.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zip file contains 28 cloud optimized tiff files that cover the continent of Africa. Each of the 28 files represents a region or area - these are not divided by country.
Notes:
This cumulative dataset contains statistics on mortality and causes of death in South Africa covering the period 1997-2017. The mortality and causes of death dataset is part of a regular series published by Stats SA, based on data collected through the civil registration system. This dataset is the most recent cumulative round in the series which began with the separately available dataset Recorded Deaths 1996.
The main objective of this dataset is to outline emerging trends and differentials in mortality by selected socio-demographic and geographic characteristics for deaths that occurred in the registered year and over time. Reliable mortality statistics, are the cornerstone of national health information systems, and are necessary for population health assessment, health policy and service planning; and programme evaluation. They are essential for studying the occurrence and distribution of health-related events, their determinants and management of related health problems. These data are particularly critical for monitoring the Sustainable Development Goals (SDGs) and Agenda 2063 which share the same goal for a high standard of living and quality of life, sound health and well-being for all and at all ages. Mortality statistics are also required for assessing the impact of non-communicable diseases (NCD's), emerging infectious diseases, injuries and natural disasters.
National coverage
Individuals
This dataset is based on information on mortality and causes of death from the South African civil registration system. It covers all death notification forms from the Department of Home Affairs for deaths that occurred in 1997-2017, that reached Stats SA during the 2018/2019 processing phase.
Administrative records data [adm]
Other [oth]
The registration of deaths is captured using two instruments: form BI-1663 and form DHA-1663 (Notification/Register of death/stillbirth).
This cumulative dataset is part of a regular series published by Stats SA and includes all previous rounds in the series (excluding Recorded Deaths 1996). Stats SA only includes one variable to classify the occupation group of the deceased (OccupationGrp) in the current round (1997-2017). Prior to 2016, Stats SA included both occupation group (OccupationGrp) and industry classification (Industry) in all previous rounds. Therefore, DataFirst has made the 1997-2015 cumulative round available as a separately downloadable dataset which includes both occupation group and industry classification of the deceased spanning the years 1997-2015.
There's a story behind every dataset and here's your opportunity to share yours.
As the spread of the novel covid-19 continues to run into countries it is important for us to keep records of every Information on it. Therefore, this dataset is built basically to cover the update from Africa.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. It contains Information on the dates the cases were recorded across Africa. Detailing the death, confirmed and recovery cases in each country.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Ethical AI Club John Hopkins University Runmila Institute WHO CDC Ghana Health Service
Your data will be in front of the world's largest data science community. What questions do you want to see answered? We should be able to see contributors answering questions about how Africa should prepare and put in the right measures to contain the spread. A better understanding from the Data scientists.
The West Africa Coastal Vulnerability Mapping: Population Projections, 2030 and 2050 data set is based on an unreleased working version of the Gridded Population of the World (GPW), Version 4, year 2010 population count raster but at a coarser 5 arc-minute resolution. Bryan Jones of Baruch College produced country-level projections based on the Shared Socioeconomic Pathway 4 (SSP4). SSP4 reflects a divided world where cities that have relatively high standards of living, are attractive to internal and international migrants. In low income countries, rapidly growing rural populations live on shrinking areas of arable land due to both high population pressure and expansion of large-scale mechanized farming by international agricultural firms. This pressure induces large migration flow to the cities, contributing to fast urbanization, although urban areas do not provide many opportUnities for the poor and there is a massive expansion of slums and squatter settlements. This scenario may not be the most likely for the West Africa region, but it has internal coherence and is at least plausible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population density per pixel at 100 metre resolution. WorldPop provides estimates of numbers of people residing in each 100x100m grid cell for every low and middle income country. Through ingegrating cencus, survey, satellite and GIS datasets in a flexible machine-learning framework, high resolution maps of population counts and densities for 2000-2020 are produced, along with accompanying metadata. DATASET: Alpha version 2010 and 2015 estimates of numbers of people per grid square, with national totals adjusted to match UN population division estimates (http://esa.un.org/wpp/) and remaining unadjusted. REGION: Africa SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the equator) PROJECTION: Geographic, WGS84 UNITS: Estimated persons per grid square MAPPING APPROACH: Land cover based, as described in: Linard, C., Gilbert, M., Snow, R.W., Noor, A.M. and Tatem, A.J., 2012, Population distribution, settlement patterns and accessibility across Africa in 2010, PLoS ONE, 7(2): e31743. FORMAT: Geotiff (zipped using 7-zip (open access tool): www.7-zip.org) FILENAMES: Example - AGO10adjv4.tif = Angola (AGO) population count map for 2010 (10) adjusted to match UN national estimates (adj), version 4 (v4). Population maps are updated to new versions when improved census or other input data become available. Rwanda data available from WorldPop here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household incomes over the past decade across various racial categories identified by the U.S. Census Bureau in South Gorin. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. It also showcases the annual income trends, between 2011 and 2021, providing insights into the economic shifts within diverse racial communities.The dataset can be utilized to gain insights into income disparities and variations across racial categories, aiding in data analysis and decision-making..
Key observations
https://i.neilsberg.com/ch/south-gorin-mo-median-household-income-by-race-trends.jpeg" alt="South Gorin, MO median household income trends across races (2011-2021, in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Gorin median household income by race. You can refer the same here
Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will