This cumulative dataset contains statistics on mortality and causes of death in South Africa covering the period 1997-2017. The mortality and causes of death dataset is part of a regular series published by Stats SA, based on data collected through the civil registration system. This dataset is the most recent cumulative round in the series which began with the separately available dataset Recorded Deaths 1996.
The main objective of this dataset is to outline emerging trends and differentials in mortality by selected socio-demographic and geographic characteristics for deaths that occurred in the registered year and over time. Reliable mortality statistics, are the cornerstone of national health information systems, and are necessary for population health assessment, health policy and service planning; and programme evaluation. They are essential for studying the occurrence and distribution of health-related events, their determinants and management of related health problems. These data are particularly critical for monitoring the Sustainable Development Goals (SDGs) and Agenda 2063 which share the same goal for a high standard of living and quality of life, sound health and well-being for all and at all ages. Mortality statistics are also required for assessing the impact of non-communicable diseases (NCD's), emerging infectious diseases, injuries and natural disasters.
National coverage
Individuals
This dataset is based on information on mortality and causes of death from the South African civil registration system. It covers all death notification forms from the Department of Home Affairs for deaths that occurred in 1997-2017, that reached Stats SA during the 2018/2019 processing phase.
Administrative records data [adm]
Other [oth]
The registration of deaths is captured using two instruments: form BI-1663 and form DHA-1663 (Notification/Register of death/stillbirth).
This cumulative dataset is part of a regular series published by Stats SA and includes all previous rounds in the series (excluding Recorded Deaths 1996). Stats SA only includes one variable to classify the occupation group of the deceased (OccupationGrp) in the current round (1997-2017). Prior to 2016, Stats SA included both occupation group (OccupationGrp) and industry classification (Industry) in all previous rounds. Therefore, DataFirst has made the 1997-2015 cumulative round available as a separately downloadable dataset which includes both occupation group and industry classification of the deceased spanning the years 1997-2015.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Suburb-based crime statistics for crimes against the person and crimes against property. The Crime statistics datasets contain all offences against the person and property that were reported to police in that respective financial year. The Family and Domestic Abuse-related offences datasets are a subset of this, in that a separate file is presented for these offences that were flagged as being of a family and domestic abuse nature for that financial year. Consequently the two files for the same financial year must not be added together. Data is point in time.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Effect of suicide rates on life expectancy dataset
Abstract In 2015, approximately 55 million people died worldwide, of which 8 million committed suicide. In the USA, one of the main causes of death is the aforementioned suicide, therefore, this experiment is dealing with the question of how much suicide rates affects the statistics of average life expectancy. The experiment takes two datasets, one with the number of suicides and life expectancy in the second one and combine data into one dataset. Subsequently, I try to find any patterns and correlations among the variables and perform statistical test using simple regression to confirm my assumptions.
Data
The experiment uses two datasets - WHO Suicide Statistics[1] and WHO Life Expectancy[2], which were firstly appropriately preprocessed. The final merged dataset to the experiment has 13 variables, where country and year are used as index: Country, Year, Suicides number, Life expectancy, Adult Mortality, which is probability of dying between 15 and 60 years per 1000 population, Infant deaths, which is number of Infant Deaths per 1000 population, Alcohol, which is alcohol, recorded per capita (15+) consumption, Under-five deaths, which is number of under-five deaths per 1000 population, HIV/AIDS, which is deaths per 1 000 live births HIV/AIDS, GDP, which is Gross Domestic Product per capita, Population, Income composition of resources, which is Human Development Index in terms of income composition of resources, and Schooling, which is number of years of schooling.
LICENSE
THE EXPERIMENT USES TWO DATASET - WHO SUICIDE STATISTICS AND WHO LIFE EXPECTANCY, WHICH WERE COLLEECTED FROM WHO AND UNITED NATIONS WEBSITE. THEREFORE, ALL DATASETS ARE UNDER THE LICENSE ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 3.0 IGO (https://creativecommons.org/licenses/by-nc-sa/3.0/igo/).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data relating to child deaths in South Australia, as reported in the Child Death and Serious Injury Review Committee's Annual Report 2021-22.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset contains multi-modal data from over 70,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.
More than 90,000 patients and 280,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.
Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). For instance, regarding image filess, 66K of them are CC BY, 32K are CC BY-NC-SA, 32K are CC BY-NC, and 20 of them are CC0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The South Australian injury surveillance data was collected from the casualty services of several metropolitan public hospitals. The exercise was conducted from 1986 through June 2013. The objective was to identify the circumstances associated with hospital-treated injury, in order to better target opportunities for prevention. A number of important safety measures resulted directly from this work. NB. The historical success in capturing cases varied by year.
Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about countries per year in South Africa. It has 64 rows. It features 4 columns: country, health expenditure, and suicide mortality rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
dopanim
dataset features about 15,750 animal images of 15 classes, organized into four groups of dop
pelganger anim
als and collected together with ground truth labels from iNaturalist. For approximately 10,500 of these images, 20 humans provided over 52,000 annotations with an accuracy of circa 67%.task_data.json
contains data, e.g., the ground truth class labels, for each image classification task. Thereby, each task record is indexed by the iNaturalist observation index. A description of each record's entries is given in the supplementary material of the associated article.annotation_data.json
contains data, e.g., likelihoods per animal class, for each obtained image annotation. Thereby, each annotation record has a unique identifier. A description of each record's entries is given in the supplementary material of the associated article.annotator_metadata.json
contains metadata, e.g., self-assessed levels of knowledge and interest regarding animals, for each annotator. Thereby, each metadata record is indexed by the anonymous identifier of an annotator. A description of each record's entries is given in the supplementary material of the associated article.train.zip
, valid.zip
, and test.zip
contain the training, validation, and test images organized into directories of the 15 animal classes.license_code
and photo_license_code
in each record of task_data.json
. The links to each image and observation are given for further reference.annotation_data.json
and annotator_metadata.json
in an annotation campaign via LabelStudio and distribute them under the license CC-BY-NC 4.0.This work was funded by the ALDeep and CIL projects at the University of Kassel. Moreover, we thank Franz Götz-Hahn for his insightful comments on improving our annotation campaign. Finally, we thank the iNaturalist community for their many observations that help explore our nature's biodiversity and our annotators for their dedicated efforts in making the annotation campaign via LabelStudio possible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two datasets that explore causes of death due to cancer in South Africa, drawing on data from the Revised Burden of Disease estimates for the Comparative Risk Factor Assessment for South Africa, 2000. The number and percentage of deaths due to cancer by cause are ranked for persons, males and females in the tables below. Lung cancer is the leading cause of cancer in SA accounting for 17% of all cancer deaths. This is followed by oesophagus Ca which accounts for 13%, cervix cancer accounting for 8%, breast cancer accounting for 8% and liver cancer which accounts for 6% of all cancers. Many more males suffer from lung and oesophagus cancer than females.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mortality and causes of death from death notification
The Marriages and Divorces (MD) dataset is one of three primary sources of of marriage and divorce statistics in South Africa. Unlike the other two sources (population censuses and household sample surveys), the MD dataset is compiled from administrative data and based on continuous recording (i.e. from civil registration systems and administrative records). Statistics South Africa (Stats SA) regularly publishes a series of data on marriages and divorces, with the first dataset in the series begining in 2006. The most recent dataset in the series is MD 2020.
Marriage data: Data on marriages for citizens and permanent residents are obtained from registered marriage records that are collected through the civil registration systems of the Department of Home Affairs (DHA). South Africa recognises three types of marriages by law: civil marriages, customary marriages and civil unions. Before 2008, marriage data only covered civil marriages. The registration of customary marriages and civil unions began in 2003 and 2007 respectively. However from 2008 onwards, Stats SA began publishing available data on customary marriages and civil unions.
Divorce data: Data on divorces are obtained from various regional courts that deal with divorce matters. The data are based on successful divorce cases that have been issued with a decree of divorce by the Department of Justice and Constitutional Development (DoJCD). Divorce cases come from marriages that were registered in different years as well as divorce cases that were filed in different years but whose divorce decrees were granted in the relevant year of collection.
NOTE: although both the data on marriages and divorces are collected in the same year, the data sets are not linked to each other.
The data has national coverage.
Individuals
The data covers all civil marriages that were recoreded by the Department of Home Affairs and all divorce applications that were granted by the Department of Justice and Constitutional Development in 2021 in South Africa.
Administrative records
Other
Geography is problematic in this dataset as not all the data files have geographic data. The Civil Marriages and Civil Unions data files include a Province of Registration variable but the Customary Marriages data file does not. There is also no geographical data in the Divorces file. As this data file includes divorce data from only a subset of divorce courts, this lack of geographical information compromises its usability.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a dataset of blood cells photos, originally open sourced by cosmicad and akshaylambda.
There are 364 images across three classes: WBC
(white blood cells), RBC
(red blood cells), and Platelets
. There are 4888 labels across 3 classes (and 0 null examples).
Here's a class count from Roboflow's Dataset Health Check:
https://i.imgur.com/BVopW9p.png" alt="BCCD health">
And here's an example image:
https://i.imgur.com/QwyX2aD.png" alt="Blood Cell Example">
Fork
this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 500x500 export.
This is a small scale object detection dataset, commonly used to assess model performance. It's a first example of medical imaging capabilities.
We're releasing the data as public domain. Feel free to use it for any purpose.
It's not required to provide attribution, but it'd be nice! :)
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
The Marriages and Divorces (MD) dataset is one of three primary sources of of marriage and divorce statistics in South Africa. Unlike the other two sources (population censuses and household sample surveys), the MD dataset is compiled from administrative data and based on continuous recording (i.e. from civil registration systems and administrative records). Statistics South Africa (Stats SA) regularly publishes a series of data on marriages and divorces, with the first dataset in the series begining in 2006. The most recent dataset in the series is MD 2023.
Marriage data: Data on marriages for citizens and permanent residents are obtained from registered marriage records that are collected through the civil registration systems of the Department of Home Affairs (DHA). South Africa recognises three types of marriages by law: civil marriages, customary marriages and civil unions. Before 2008, marriage data only covered civil marriages. The registration of customary marriages and civil unions began in 2003 and 2007 respectively. However from 2008 onwards, Stats SA began publishing available data on customary marriages and civil unions.
Divorce data: Data on divorces are obtained from various regional courts that deal with divorce matters. The data are based on successful divorce cases that have been issued with a decree of divorce by the Department of Justice and Constitutional Development (DoJCD). Divorce cases come from marriages that were registered in different years as well as divorce cases that were filed in different years but whose divorce decrees were granted in the relevant year of collection.
NOTE: although both the data on marriages and divorces are collected in the same year, the data sets are not linked to each other.
The data has national coverage.
Individuals
The data covers all civil marriages, civil unions and customary marriages that were recorded by the Department of Home Affairs and all divorce applications that were granted by the Department of Justice and Constitutional Development in 2023 in South Africa.
Administrative records
Other
This dataset contains statistics on deaths in South Africa in 2012. The registration of deaths in South Africa is regulated by the Births and Deaths Registration Act, 51 of 1992. The South African Department of Home Affairs (DHA) is responsible for the registration of deaths in South Africa. The data is collected with two instruments: The death register and the medical certificate in respect of death. The staff of the DHA Registrar of Deaths section fills in the former while the medical practitioner attending to the death completes the latter. Causes of death are coded by the Department of Home Affairs according to the tenth revision of the International Classification of Diseases (ICD-10) ICD-10, as required by the World Health Organization for their member countries. The data is used by the Department of Home Affairs to update the Population Register. The forms are sent to Statistics South Africa (Stats SA) for their use for statistical purposes. From the two forms sent to Stats SA, the following data items of the deceased are extracted: place of residence, place of death, date of death, month and year of registration, sex, marital status, occupation, underlying cause of death, whether or not the death was certified by a medical practitioner, and whether or not the deceased died in a health institution or nursing home. From 1991 death notifications do not require data on population group, and therefore this dataset includes death data for all population groups. This dataset excludes 2012 deaths that were not registered, and late registrations which would not have been available to Stats SA in time for the production of the dataset.
National coverage
Individuals
The data covers all deaths that occurred in 2012 and registered at the Department of Home Affairs in South Africa.
Administrative records data [adm]
Other [oth]
The data is collected with two instruments: the death register and the medical certificate in respect of death.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Description:
The dataset contains pairs of encyclopedic articles in 14 languages. Each pair includes the same article in two levels of readability (easy/hard). The pairs are obtained by matching Wikipedia articles (hard) with the corresponding versions from different simplified or children's encyclopedias (easy).
Dataset Details:
Attribution:
The dataset was compiled from the following sources. The text of the original articles comes from the corresponding language version of Wikipedia. The text of the simplified articles comes from one of the following encyclopedias: Simple English Wikipedia, Vikidia, Klexikon, Txikipedia, or Wikikids.
Below we provide information about the license of the original content as well as the template to generate the link to the original source for a given page (
https://
https://simple.wikipedia.org/wiki/
https://
https://klexikon.zum.de/wiki/
https://eu.wikipedia.org/wiki/Txikipedia:
https://wikikids.nl/
Related paper citation:
@inproceedings{trokhymovych-etal-2024-open, title = "An Open Multilingual System for Scoring Readability of {W}ikipedia", author = "Trokhymovych, Mykola and Sen, Indira and Gerlach, Martin", editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.342/", doi = "10.18653/v1/2024.acl-long.342", pages = "6296--6311"
}
This graffiti-centred change detection dataset was developed in the context of INDIGO, a research project focusing on the documentation, analysis and dissemination of graffiti along Vienna's Donaukanal. The dataset aims to support the development and assessment of change detection algorithms.
The dataset was collected from a test site approximately 50 meters in length along Vienna's Donaukanal during 11 days between 2022/10/21 and 2022/12/01. Various cameras with different settings were used, resulting in a total of 29 data collection sessions or "epochs" (see "EpochIDs.jpg" for details). Each epoch contains 17 images generated from 29 distinct 3D models with different textures. In total, the dataset comprises 6,902 unique image pairs, along with corresponding reference change maps. Additionally, exclusion masks are provided to ignore parts of the scene that might be irrelevant, such as the background.
To summarise, the dataset, labelled as "Data.zip," includes the following:
Image acquisition involved the use of two different camera setups. The first two datasets (ID 1 and 2; cf. "EpochIDs.jpg") were obtained using a Nikon Z 7II camera with a pixel count of 45.4 MP, paired with a Nikon NIKKOR Z 20 mm lens. For the remaining image datasets (ID 3-29), a triple GoPro setup was employed. This triple setup featured three GoPro cameras, comprising two GoPro HERO 10 cameras and one GoPro HERO 11, all securely mounted within a frame. This triple-camera setup was utilised on nine different days with varying camera settings, resulting in the acquisition of 27 image datasets in total (nine days with three datasets each).
The "Data.zip" file contains two subfolders:
A detailed dataset description (including detailed explanations of the data creation) is part of a journal paper currently in preparation. The paper will be linked here for further clarification as soon as it is available.
Due to the nature of the three image types, this dataset comes with two licenses:
Every synthetic image, change map and mask has this licensing information embedded as IPTC photo metadata. In addition, the images' IPTC metadata also provide a short image description, the image creator and the creator's identity (in the form of an ORCiD).
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
If there are any questions, problems or suggestions for the dataset or the description, please do not hesitate to contact the corresponding author, Benjamin Wild.
All the data for this dataset is provided from CARMA: Data from CARMA (www.carma.org) This dataset provides information about Power Plant emissions in South Africa. Power Plant emissions from all power plants in South Africa were obtained by CARMA for the past (2000 Annual Report), the present (2007 data), and the future. CARMA determine data presented for the future to reflect planned plant construction, expansion, and retirement. The dataset provides the name, company, parent company, city, state, lat/lon, and plant id for each individual power plant. Only Power Plants that had a listed longitude and latitude in CARMA's database were mapped. The dataset reports for the three time periods: Intensity: Pounds of CO2 emitted per megawatt-hour of electricity produced. Energy: Annual megawatt-hours of electricity produced. Carbon: Annual carbon dioxide (CO2) emissions. The units are short or U.S. tons. Multiply by 0.907 to get metric tons. Carbon Monitoring for Action (CARMA) is a massive database containing information on the carbon emissions of over 50,000 power plants and 4,000 power companies worldwide. Power generation accounts for 40% of all carbon emissions in the United States and about one-quarter of global emissions. CARMA is the first global inventory of a major, sector of the economy. The objective of CARMA.org is to equip individuals with the information they need to forge a cleaner, low-carbon future. By providing complete information for both clean and dirty power producers, CARMA hopes to influence the opinions and decisions of consumers, investors, shareholders, managers, workers, activists, and policymakers. CARMA builds on experience with public information disclosure techniques that have proven successful in reducing traditional pollutants. Please see carma.org for more information http://carma.org/region/detail/174
This cumulative dataset contains statistics on mortality and causes of death in South Africa covering the period 1997-2017. The mortality and causes of death dataset is part of a regular series published by Stats SA, based on data collected through the civil registration system. This dataset is the most recent cumulative round in the series which began with the separately available dataset Recorded Deaths 1996.
The main objective of this dataset is to outline emerging trends and differentials in mortality by selected socio-demographic and geographic characteristics for deaths that occurred in the registered year and over time. Reliable mortality statistics, are the cornerstone of national health information systems, and are necessary for population health assessment, health policy and service planning; and programme evaluation. They are essential for studying the occurrence and distribution of health-related events, their determinants and management of related health problems. These data are particularly critical for monitoring the Sustainable Development Goals (SDGs) and Agenda 2063 which share the same goal for a high standard of living and quality of life, sound health and well-being for all and at all ages. Mortality statistics are also required for assessing the impact of non-communicable diseases (NCD's), emerging infectious diseases, injuries and natural disasters.
National coverage
Individuals
This dataset is based on information on mortality and causes of death from the South African civil registration system. It covers all death notification forms from the Department of Home Affairs for deaths that occurred in 1997-2017, that reached Stats SA during the 2018/2019 processing phase.
Administrative records data [adm]
Other [oth]
The registration of deaths is captured using two instruments: form BI-1663 and form DHA-1663 (Notification/Register of death/stillbirth).
This cumulative dataset is part of a regular series published by Stats SA and includes all previous rounds in the series (excluding Recorded Deaths 1996). Stats SA only includes one variable to classify the occupation group of the deceased (OccupationGrp) in the current round (1997-2017). Prior to 2016, Stats SA included both occupation group (OccupationGrp) and industry classification (Industry) in all previous rounds. Therefore, DataFirst has made the 1997-2015 cumulative round available as a separately downloadable dataset which includes both occupation group and industry classification of the deceased spanning the years 1997-2015.