7 datasets found
  1. Coronavirus disease (COVID-19) case data - South Africa

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv
    Updated Feb 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso (2023). Coronavirus disease (COVID-19) case data - South Africa [Dataset]. http://doi.org/10.5281/zenodo.3724083
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Feb 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Africa
    Description

    COVID 19 Data for South Africa created, maintained and hosted by DSFSI research group at the University of Pretoria

    Disclaimer: We have worked to keep the data as accurate as possible. We collate the COVID 19 reporting data from NICD and South Africa DoH. We only update that data once there is an official report or statement. For the other data, we work to keep the data as accurate as possible. If you find errors let us know.

    See original GitHub repo for detailed information https://github.com/dsfsi/covid19za

  2. W

    COVID-19 Data Repository for South Africa

    • cloud.csiss.gmu.edu
    csv
    Updated Oct 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Africa Data Hub (2021). COVID-19 Data Repository for South Africa [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/covid-19-data-repository-for-south-africa
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    Africa Data Hub
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Africa
    Description

    Coronavirus COVID-19 (2019-nCoV) Data Repository for South Africa created, maintained and hosted by Data Science for Social Impact research group, led by Dr. Vukosi Marivate, at the University of Pretoria.

    Disclaimer: The maintainers have worked to keep the data as accurate as possible. The COVID 19 reporting data has been collated from NICD and DoH and is only updated once there is an official report or statement.

    If you use this repo for any research/development/innovation, please contact the maintainers of the data.

    Please note that these reports are the daily reports as released by the National Department of Health or the NICD. The new cases reported are based on new positive test reports released. However, there may be a significant lag from when the patient was tested. As an example, in epidemiological Week 1 of 2021 (3-9 Jan) approximately 33k new cases were reported on the daily announcement. However, the NICD Testing Summary Report for Week 3 of 2021 (which also reports the two previous weeks) shows that the number of positive tests was 43635 for Week 1 of 2021. The difference is due to the lag in testing being done -- some of the 33k cases reported on the daily announcements were actually from prior weeks while a large number of people were tested between 3-9 January, but the cases were only reported from the 10th onwards. Care needs to be taken in doing some analyses to take this into account.

  3. h

    za-marito-dsac

    • huggingface.co
    • ollama.hf-mirror.com
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science for Social Impact (2025). za-marito-dsac [Dataset]. https://huggingface.co/datasets/dsfsi/za-marito-dsac
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Data Science for Social Impact
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About

    Department of Sports Arts and Culture Multilingual Terminology Lists

      Author(s)
    

    Author(s) - Original DSAC Multilingual Terminology Lists Author(s) - Original OERTB + Data Science for Social Impact Team (To be updated)

      LICENSE for Data
    

    The files on https://github.com/dsfsi/za-marito/ are under CC-BY-SA-4.0 and should acknowledge the Original Department of Sports Arts and Culture Multilingual Terminology Lists + Open Database authors list.

  4. Z

    PuoBERTa + PuoBERTaJW300 Setswana Language Models

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lastrucci, Richard (2023). PuoBERTa + PuoBERTaJW300 Setswana Language Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8434795
    Explore at:
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Lastrucci, Richard
    Dzingirai, Isheanesu
    Mots'Oehli, Moseli
    Wagner, Valencia
    Marivate, Vukosi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PuoBERTa + PuoBERTaJW300: Setswana Language Models A Roberta-based language model specially designed for Setswana, using the new PuoData dataset (PuoBERTa) and PuoData + JW300 TSN (PuoBERTaJW300) Cite @inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science}, url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} } Model Details Model Description This is a masked language model trained on Setswana corpora, making it a valuable tool for a range of downstream applications from translation to content creation. It's powered by the PuoData dataset to ensure accuracy and cultural relevance.

    Developed by: Vukosi Marivate (@vukosi), Moseli Mots'Oehli (@MoseliMotsoehli) , Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai Model type: RoBERTa Model Language(s) (NLP): Setswana License: CC BY 4.0 Usage Use this model filling in masks or finetune for downstream tasks. Here's a simple example for masked prediction: from transformers import RobertaTokenizer, RobertaModel # Load model and tokenizer model = RobertaModel.from_pretrained('dsfsi/PuoBERTa') tokenizer = RobertaTokenizer.from_pretrained('dsfsi/PuoBERTa')

    Downstream Use Downstream Performance MasakhaPOS Performance of models on the MasakhaPOS downstream task. Model Test Performance Multilingual Models AfroLM 83.8 AfriBERTa 82.5 AfroXLMR-base 82.7 AfroXLMR-large 83.0 Monolingual Models NCHLT TSN RoBERTa 82.3 PuoBERTa 83.4 PuoBERTa+JW300 84.1

    MasakhaNER Performance of models on the MasakhaNER downstream task. Model Test Performance (f1 score) Multilingual Models AfriBERTa 83.2 AfroXLMR-base 87.7 AfroXLMR-large 89.4 Monolingual Models NCHLT TSN RoBERTa 74.2 PuoBERTa 78.2 PuoBERTa+JW300 80.2

    Dataset We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned. Citation Information Bibtex Reference @inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science}, url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} } Contributing Your contributions are welcome! Feel free to improve the model. Model Card Authors Vukosi Marivate Model Card Contact For more details, reach out or check our website. Email: vukosi.marivate@cs.up.ac.za Enjoy exploring Setswana through AI!

  5. Z

    Coronavirus COVID-19 (2019-nCoV) Data Repository for Africa

    • data.niaid.nih.gov
    • catalog.midasnetwork.us
    • +2more
    Updated Apr 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Africa open COVID-19 data working group (2020). Coronavirus COVID-19 (2019-nCoV) Data Repository for Africa [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3732979
    Explore at:
    Dataset updated
    Apr 20, 2020
    Dataset provided by
    Marivate, Vukosi
    Esube Bekele
    Africa open COVID-19 data working group
    Nsoesie, Elaine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Africa
    Description

    The purpose of this repository is to collate data on the ongoing coronavirus pandemic in Africa. Our goal is to record detailed information on each reported case in every African country. We want to build a line list – a table summarizing information about people who are infected, dead, or recovered. The table for each African country would include demographic, location, and symptom (where available) information for each reported case. The data will be obtained from official sources (e.g., WHO, departments of health, CDC etc.) and unofficial sources (e.g., news). Such a dataset has many uses, including studying the spread of COVID-19 across Africa and assessing similarities and differences to what’s being observed in other regions of the world.

    See the repo here https://github.com/dsfsi/covid19africa

  6. Z

    The South African Gov-ZA multilingual corpus

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marivate, Vukosi (2023). The South African Gov-ZA multilingual corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7635167
    Explore at:
    Dataset updated
    Jul 6, 2023
    Dataset provided by
    Shingange, Matimba
    Rajab, Jenalea
    Marivate, Vukosi
    Lastrucci, Richard
    Dzingirai, Isheanesu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Africa
    Description

    The South African Gov-ZA multilingual corpus

    Github: https://github.com/dsfsi/gov-za-multilingual Zenodo:

    About Dataset

    The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements

    The datasets contain government cabinet statements in 11 languages, namely:

    LanguageCodeLanguageCode
    English(eng)Sepedi(nso)
    Afrikaans(afr)Setswana(tsn)
    isiNdebele(nbl)Siswati(ssw)
    isiXhosa(xho)Tshivenda(ven)
    isiZulu(zul)Xitstonga(tso)
    Sesotho(sot)

    The dataset contains the full data in a JSON file (/data/govza-cabinet-statements.json), as well as CSV’s split by each language, eg: “govza-cabinet-statements-en.csv” for english. The dataset does not contain special characters like unicode or ascii.

    Please see the data-statement.md for full dataset information. (TODO)

    Number of Aligned Pairs with Cosine Similarity Score >= 0.65

    src_langtrg_langnum_aligned_pairs
    afreng14549
    afrnbl6621
    afrnso15388
    afrsot8834
    afrssw15610
    afrtsn12605
    afrtso14936
    afrven5776
    afrxho16065
    afrzul14998
    nbleng3616
    nblnso6342
    nblsot16163
    nblssw4655
    nbltsn3369
    nbltso4465
    nblven18984
    nblxho5213
    nblzul3868
    nsoeng15257
    nsossw18697
    nsotsn16179
    nsotso17617
    nsoven6367
    soteng5212
    sotnso8077
    sotssw5811
    sottsn5450
    sottso6586
    sotven14098
    ssweng15721
    sswtso17880
    sswven4588
    tsneng14544
    tsnssw16386
    tsntso16681
    tsnven3267
    tsoeng16068
    veneng3670
    ventso4578
    xhoeng16537
    xhonso18110
    xhosot7489
    xhossw18387
    xhotsn16571
    xhotso17954
    xhoven4559
    xhozul18145
    zuleng16149
    zulnso17630
    zulsot5975
    zulssw18563
    zultsn16482
    zultso17789
    zulven3606

    Authors

    • Vukosi Marivate - @vukosi
    • Matimba Shingange
    • Richard Lastrucci
    • Isheanesu Joseph Dzingirai
    • Jenalea Rajab

    Publications

    @inproceedings{lastrucci-etal-2023-preparing, title = "Preparing the Vuk{'}uzenzele and {ZA}-gov-multilingual {S}outh {A}frican multilingual corpora", author = "Richard Lastrucci and Isheanesu Dzingirai and Jenalea Rajab and Andani Madodonga and Matimba Shingange and Daniel Njini and Vukosi Marivate", booktitle = "Proceedings of the Fourth workshop on Resources for African Indigenous Languages (RAIL 2023)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.rail-1.3", pages = "18--25" }

  7. Embedding Evaluation Data for South African Languages

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vukosi Marivate; Vukosi Marivate; Valencia Wagner; Mack Makgatho; Tshephisho Sefara; Tshephisho Sefara; Valencia Wagner; Mack Makgatho (2023). Embedding Evaluation Data for South African Languages [Dataset]. http://doi.org/10.5281/zenodo.5673974
    Explore at:
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vukosi Marivate; Vukosi Marivate; Valencia Wagner; Mack Makgatho; Tshephisho Sefara; Tshephisho Sefara; Valencia Wagner; Mack Makgatho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Africa
    Description

    WordSim and Simlex Data for South African Languages

    • Setswana
    • Sepedi

    Embedding Evaluation Data for South African Languages

    Dataset Information\

    The datasets(Simlex and WordSim) contain pairs of Setswana and Sepedi words that have been assigned similarity ratings by humans to measure semantic relatedness. The word-pairs(Simlex and WordSim) are manually translated from English to Setswana and Sepedi. The evaluation task aims to find the degree of correlation between the scores provided by the model and the human rating, the score of the model is collected by computing the cosine similarity of corresponding vectors for word pairs.

    Online Repository link

    Authors

    • Vukosi Marivate - @vukosi
    • Valencia Wagner
    • Mack Makgatho
    • Tshephisho Sefara

    See also the list of contributors who participated in this project.

    Citing the dataset

    To appear in conference proceedings

    @article{Makgatho_Marivate_Sefara_Wagner_2022, title={Training Cross-Lingual embeddings for Setswana and Sepedi},
    volume={3},
    url={https://upjournals.up.ac.za/index.php/dhasa/article/view/3822},
    DOI={10.55492/dhasa.v3i03.3822},
    number={03},
    journal={Journal of the Digital Humanities Association of Southern Africa },
    author={Makgatho, Mack and Marivate, Vukosi and Sefara, Tshephisho and Wagner, Valencia},
    year={2022},
    month={Feb.}}

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso (2023). Coronavirus disease (COVID-19) case data - South Africa [Dataset]. http://doi.org/10.5281/zenodo.3724083
Organization logo

Coronavirus disease (COVID-19) case data - South Africa

Explore at:
bin, csvAvailable download formats
Dataset updated
Feb 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso; Vukosi Marivate; Alta de Waal; Herkulaas Combrink; Ofentswe Lebogo; Shivan Moodley; Nompumelelo Mtsweni; Vuthlari Rikhotso
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
South Africa
Description

COVID 19 Data for South Africa created, maintained and hosted by DSFSI research group at the University of Pretoria

Disclaimer: We have worked to keep the data as accurate as possible. We collate the COVID 19 reporting data from NICD and South Africa DoH. We only update that data once there is an official report or statement. For the other data, we work to keep the data as accurate as possible. If you find errors let us know.

See original GitHub repo for detailed information https://github.com/dsfsi/covid19za

Search
Clear search
Close search
Google apps
Main menu