7 datasets found

Coronavirus disease (COVID-19) case data - South Africa
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Feb 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso (2023). Coronavirus disease (COVID-19) case data - South Africa [Dataset]. http://doi.org/10.5281/zenodo.3724083
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3724083
Dataset updated
Feb 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Africa
Description
COVID 19 Data for South Africa created, maintained and hosted by DSFSI research group at the University of Pretoria

Disclaimer: We have worked to keep the data as accurate as possible. We collate the COVID 19 reporting data from NICD and South Africa DoH. We only update that data once there is an official report or statement. For the other data, we work to keep the data as accurate as possible. If you find errors let us know.

See original GitHub repo for detailed information https://github.com/dsfsi/covid19za
W
COVID-19 Data Repository for South Africa
cloud.csiss.gmu.edu
csv
Updated Oct 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Africa Data Hub (2021). COVID-19 Data Repository for South Africa [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/covid-19-data-repository-for-south-africa
Explore at:
csvAvailable download formats
Dataset updated
Oct 29, 2021
Dataset provided by
Africa Data Hub
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Africa
Description
Coronavirus COVID-19 (2019-nCoV) Data Repository for South Africa created, maintained and hosted by Data Science for Social Impact research group, led by Dr. Vukosi Marivate, at the University of Pretoria.

Disclaimer: The maintainers have worked to keep the data as accurate as possible. The COVID 19 reporting data has been collated from NICD and DoH and is only updated once there is an official report or statement.

If you use this repo for any research/development/innovation, please contact the maintainers of the data.

Please note that these reports are the daily reports as released by the National Department of Health or the NICD. The new cases reported are based on new positive test reports released. However, there may be a significant lag from when the patient was tested. As an example, in epidemiological Week 1 of 2021 (3-9 Jan) approximately 33k new cases were reported on the daily announcement. However, the NICD Testing Summary Report for Week 3 of 2021 (which also reports the two previous weeks) shows that the number of positive tests was 43635 for Week 1 of 2021. The difference is due to the lag in testing being done -- some of the 33k cases reported on the daily announcements were actually from prior weeks while a large number of people were tested between 3-9 January, but the cases were only reported from the 10th onwards. Care needs to be taken in doing some analyses to take this into account.
h
za-marito-dsac
huggingface.co
ollama.hf-mirror.com
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Science for Social Impact (2025). za-marito-dsac [Dataset]. https://huggingface.co/datasets/dsfsi/za-marito-dsac
Explore at:
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Data Science for Social Impact
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
About

Department of Sports Arts and Culture Multilingual Terminology Lists

Author(s)

Author(s) - Original DSAC Multilingual Terminology Lists Author(s) - Original OERTB + Data Science for Social Impact Team (To be updated)

LICENSE for Data

The files on https://github.com/dsfsi/za-marito/ are under CC-BY-SA-4.0 and should acknowledge the Original Department of Sports Arts and Culture Multilingual Terminology Lists + Open Database authors list.
Z
PuoBERTa + PuoBERTaJW300 Setswana Language Models
data.niaid.nih.gov
zenodo.org
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lastrucci, Richard (2023). PuoBERTa + PuoBERTaJW300 Setswana Language Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8434795
Explore at:
Dataset updated
Dec 4, 2023
Dataset provided by
Lastrucci, Richard
Dzingirai, Isheanesu
Mots'Oehli, Moseli
Wagner, Valencia
Marivate, Vukosi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PuoBERTa + PuoBERTaJW300: Setswana Language Models A Roberta-based language model specially designed for Setswana, using the new PuoData dataset (PuoBERTa) and PuoData + JW300 TSN (PuoBERTaJW300) Cite @inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science}, url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} } Model Details Model Description This is a masked language model trained on Setswana corpora, making it a valuable tool for a range of downstream applications from translation to content creation. It's powered by the PuoData dataset to ensure accuracy and cultural relevance.

Developed by: Vukosi Marivate (@vukosi), Moseli Mots'Oehli (@MoseliMotsoehli) , Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai Model type: RoBERTa Model Language(s) (NLP): Setswana License: CC BY 4.0 Usage Use this model filling in masks or finetune for downstream tasks. Here's a simple example for masked prediction: from transformers import RobertaTokenizer, RobertaModel # Load model and tokenizer model = RobertaModel.from_pretrained('dsfsi/PuoBERTa') tokenizer = RobertaTokenizer.from_pretrained('dsfsi/PuoBERTa')

Downstream Use Downstream Performance MasakhaPOS Performance of models on the MasakhaPOS downstream task. Model Test Performance Multilingual Models AfroLM 83.8 AfriBERTa 82.5 AfroXLMR-base 82.7 AfroXLMR-large 83.0 Monolingual Models NCHLT TSN RoBERTa 82.3 PuoBERTa 83.4 PuoBERTa+JW300 84.1

MasakhaNER Performance of models on the MasakhaNER downstream task. Model Test Performance (f1 score) Multilingual Models AfriBERTa 83.2 AfroXLMR-base 87.7 AfroXLMR-large 89.4 Monolingual Models NCHLT TSN RoBERTa 74.2 PuoBERTa 78.2 PuoBERTa+JW300 80.2

Dataset We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned. Citation Information Bibtex Reference @inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science}, url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} } Contributing Your contributions are welcome! Feel free to improve the model. Model Card Authors Vukosi Marivate Model Card Contact For more details, reach out or check our website. Email: vukosi.marivate@cs.up.ac.za Enjoy exploring Setswana through AI!
Z
Coronavirus COVID-19 (2019-nCoV) Data Repository for Africa
data.niaid.nih.gov
catalog.midasnetwork.us
+2more
Updated Apr 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Africa open COVID-19 data working group (2020). Coronavirus COVID-19 (2019-nCoV) Data Repository for Africa [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3732979
Explore at:
Dataset updated
Apr 20, 2020
Dataset provided by
Marivate, Vukosi
Esube Bekele
Africa open COVID-19 data working group
Nsoesie, Elaine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Africa
Description
The purpose of this repository is to collate data on the ongoing coronavirus pandemic in Africa. Our goal is to record detailed information on each reported case in every African country. We want to build a line list – a table summarizing information about people who are infected, dead, or recovered. The table for each African country would include demographic, location, and symptom (where available) information for each reported case. The data will be obtained from official sources (e.g., WHO, departments of health, CDC etc.) and unofficial sources (e.g., news). Such a dataset has many uses, including studying the spread of COVID-19 across Africa and assessing similarities and differences to what’s being observed in other regions of the world.

See the repo here https://github.com/dsfsi/covid19africa

The South African Gov-ZA multilingual corpus

data.niaid.nih.gov
zenodo.org

Updated Jul 6, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Marivate, Vukosi (2023). The South African Gov-ZA multilingual corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7635167

Explore at:

Dataset updated

Jul 6, 2023

Dataset provided by

Shingange, Matimba
Rajab, Jenalea
Marivate, Vukosi
Lastrucci, Richard
Dzingirai, Isheanesu

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

South Africa

Description

The South African Gov-ZA multilingual corpus

Github: https://github.com/dsfsi/gov-za-multilingual Zenodo:

About Dataset

The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements

The datasets contain government cabinet statements in 11 languages, namely:

Language	Code	Language	Code
English	(eng)	Sepedi	(nso)
Afrikaans	(afr)	Setswana	(tsn)
isiNdebele	(nbl)	Siswati	(ssw)
isiXhosa	(xho)	Tshivenda	(ven)
isiZulu	(zul)	Xitstonga	(tso)
Sesotho	(sot)

The dataset contains the full data in a JSON file (/data/govza-cabinet-statements.json), as well as CSV’s split by each language, eg: “govza-cabinet-statements-en.csv” for english. The dataset does not contain special characters like unicode or ascii.

Please see the data-statement.md for full dataset information. (TODO)

Number of Aligned Pairs with Cosine Similarity Score >= 0.65

src_lang	trg_lang	num_aligned_pairs
afr	eng	14549
afr	nbl	6621
afr	nso	15388
afr	sot	8834
afr	ssw	15610
afr	tsn	12605
afr	tso	14936
afr	ven	5776
afr	xho	16065
afr	zul	14998
nbl	eng	3616
nbl	nso	6342
nbl	sot	16163
nbl	ssw	4655
nbl	tsn	3369
nbl	tso	4465
nbl	ven	18984
nbl	xho	5213
nbl	zul	3868
nso	eng	15257
nso	ssw	18697
nso	tsn	16179
nso	tso	17617
nso	ven	6367
sot	eng	5212
sot	nso	8077
sot	ssw	5811
sot	tsn	5450
sot	tso	6586
sot	ven	14098
ssw	eng	15721
ssw	tso	17880
ssw	ven	4588
tsn	eng	14544
tsn	ssw	16386
tsn	tso	16681
tsn	ven	3267
tso	eng	16068
ven	eng	3670
ven	tso	4578
xho	eng	16537
xho	nso	18110
xho	sot	7489
xho	ssw	18387
xho	tsn	16571
xho	tso	17954
xho	ven	4559
xho	zul	18145
zul	eng	16149
zul	nso	17630
zul	sot	5975
zul	ssw	18563
zul	tsn	16482
zul	tso	17789
zul	ven	3606

Authors

Vukosi Marivate - @vukosi
Matimba Shingange
Richard Lastrucci
Isheanesu Joseph Dzingirai
Jenalea Rajab

Publications

@inproceedings{lastrucci-etal-2023-preparing, title = "Preparing the Vuk{'}uzenzele and {ZA}-gov-multilingual {S}outh {A}frican multilingual corpora", author = "Richard Lastrucci and Isheanesu Dzingirai and Jenalea Rajab and Andani Madodonga and Matimba Shingange and Daniel Njini and Vukosi Marivate", booktitle = "Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.rail-1.3", pages = "18--25" }

Embedding Evaluation Data for South African Languages
zenodo.org
data.niaid.nih.gov
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vukosi Marivate; Vukosi Marivate; Valencia Wagner; Mack Makgatho; Tshephisho Sefara; Tshephisho Sefara; Valencia Wagner; Mack Makgatho (2023). Embedding Evaluation Data for South African Languages [Dataset]. http://doi.org/10.5281/zenodo.5673974
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5673974
Dataset updated
Jun 2, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vukosi Marivate; Vukosi Marivate; Valencia Wagner; Mack Makgatho; Tshephisho Sefara; Tshephisho Sefara; Valencia Wagner; Mack Makgatho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Africa
Description
WordSim and Simlex Data for South African Languages

Setswana

Sepedi

Embedding Evaluation Data for South African Languages

Dataset Information\

The datasets(Simlex and WordSim) contain pairs of Setswana and Sepedi words that have been assigned similarity ratings by humans to measure semantic relatedness. The word-pairs(Simlex and WordSim) are manually translated from English to Setswana and Sepedi. The evaluation task aims to find the degree of correlation between the scores provided by the model and the human rating, the score of the model is collected by computing the cosine similarity of corresponding vectors for word pairs.

Online Repository link

Zenodo Data Repository - Link to the data repository.

Authors

Vukosi Marivate - @vukosi

Valencia Wagner

Mack Makgatho

Tshephisho Sefara

See also the list of contributors who participated in this project.

Citing the dataset

To appear in conference proceedings

@article{Makgatho_Marivate_Sefara_Wagner_2022, title={Training Cross-Lingual embeddings for Setswana and Sepedi},
volume={3},
url={https://upjournals.up.ac.za/index.php/dhasa/article/view/3822},
DOI={10.55492/dhasa.v3i03.3822},
number={03},
journal={Journal of the Digital Humanities Association of Southern Africa },
author={Makgatho, Mack and Marivate, Vukosi and Sefara, Tshephisho and Wagner, Valencia},
year={2022},
month={Feb.}}
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso (2023). Coronavirus disease (COVID-19) case data - South Africa [Dataset]. http://doi.org/10.5281/zenodo.3724083

Coronavirus disease (COVID-19) case data - South Africa

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3724083

Dataset updated

Feb 21, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

South Africa

Description

COVID 19 Data for South Africa created, maintained and hosted by DSFSI research group at the University of Pretoria

Disclaimer: We have worked to keep the data as accurate as possible. We collate the COVID 19 reporting data from NICD and South Africa DoH. We only update that data once there is an official report or statement. For the other data, we work to keep the data as accurate as possible. If you find errors let us know.

See original GitHub repo for detailed information https://github.com/dsfsi/covid19za

Clear search

Close search

Google apps

Main menu

Coronavirus disease (COVID-19) case data - South Africa

COVID-19 Data Repository for South Africa

za-marito-dsac

PuoBERTa + PuoBERTaJW300 Setswana Language Models

Coronavirus COVID-19 (2019-nCoV) Data Repository for Africa

The South African Gov-ZA multilingual corpus

The South African Gov-ZA multilingual corpus

About Dataset

Number of Aligned Pairs with Cosine Similarity Score >= 0.65

Authors

Publications

Embedding Evaluation Data for South African Languages

Coronavirus disease (COVID-19) case data - South AfricaSee More Versions

Coronavirus disease (COVID-19) case data - South Africa