This dataset was obtained from the National Household Travel Survey. Due the volume of the data, it was divided in two. This dataset shows the time spent and the travel mode to go to the study place all over the country
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The FLOTROP dataset contains numerous plant observations (around 340,000 occurrences) in northern tropical Africa (from the 5 th to 25th parallel north) in open ecosystems (savannah and steppe). They were collected by multiple collectors between 1920 and 2012 and were managed by Philippe Daget. These observations are probably the most important and unique source of plant distribution over the Sahel area. The data are now available in the Global Biodiversity Information Facility (GBIF), Tela Botanica website, and as maps in the African Plant Database. For the overall area involved, this dataset has increased by 40% the data available in the GBIF. For some countries between the 15th and 21st parallel north, the FLOTROP dataset has increased available occurrences 10-fold compared to the data existing in the GBIF.
Tropical northern Africa (herein defined as between the 5th and 25th parallel north) is mostly occupied by open ecosystems, such as steppe and savannah. The vegetation in these ecosystems is consumed by animals, either wildlife or livestock, and is also used by the local communities for food, energy or medicinal purposes. The open ecosystems in tropical northern Africa are of great importance to the economy, food security and human well-being. Plant diversity within these ecosystems is driven by many factors, such as the climate, soil, fire and grazing. Plant diversity in these regions is being greatly impacted by global change. Historical data are needed to understand species and diversity dynamics. The database presented in this work is the collection of numerous datasets gathered over the years. At the outset, the FLOTROP database was intended to store all the data recorded by IEMVT (French institute for tropical livestock production and veterinary medicine, now part of CIRAD) in the sixties. In 1993, CIRAD and CNRS set up a project to collect a maximum of botanical surveys within these regions. Two software packages were created by the team to manage the database. The first was created under DOS then a second was started under Windows using the APL DYALOG language. Data were collected and scanned between 1993 and 2016. We extracted the data from the software version. We shared the species occurrences recorded in the database on the Tela Botanica website (http://www.tela-botanica.org/) and on the GBIF database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset includes the images (visible bands for Landsat-8 or NICFI PlanetScope), auxiliary data (infrared, NCEP, forest gain, OpenStreetMap, SRTM, GFW), and data about forest loss (Global Forest Change) used to train, validate and test a model to classify direct deforestation drivers in Cameroon.
Description of the files
‘labels.zip’: in csv files, the labels for each image in each folder described above (image identified by folder and coordinates or ‘path’) and matches the format of the csv files used as inputs to train, validate and test our classification model
For ‘labels.zip’, we have subfolders for Landsat and PlanetScope. Then, for each type of imagery, we have subfolders for ‘detailed’, ‘groups’ and ‘time series’ which correspond to the different ‘my_examples’ folders listed above.
For each folder, subfolders named with the coordinates of the centre of the images contain each:
• A folder ‘images’, with a sub-folder ‘visible’ containing the PNG RGB image; and a sub-folder ‘infrared’ containing the infrared bands in a NPY file.
• A folder ‘auxiliary’ with topographic and forest gain information in a NPY format, OpenStreetMap and peat data in a JSON format, and a sub-folder ‘ncep’ containing all data from NCEP in a NPY format.
• The forest loss pickle file delimiting the area of forest loss.
Details about the images
For Landsat-8 data (courtesy of the U.S. Geological Survey), this dataset contains 332x 332 pixels RGB calibrated top-of-atmosphere (TOA) reflectance images pan-sharpened to a 15 m resolution (less than 20% cloud cover)
For NICFI PlanetScope data (catalog owner: Planet), this dataset contains 332x 332 pixels monthly RGB composite with a 4.77 m resolution
Details about the auxiliary data
Details about Global Forest Change
For each image, there is a corresponding 'forest_loss_region' .pkl file delimiting a forest loss region polygon from Global Forest Change (GFC). GFC consists of annual maps of forest cover loss with a 30-m resolution.
License
The NICFI PlanetScope images fall under the same license as the NICFI data program license agreement (data in 'my_examples_planet_final.zip', 'my_examples_planet_final_detailed.zip', 'my_examples_planet_detailed_timeseries.zip': subfolders '[coordinates]'>'images'>'visible').
OpenStreetMap® is open data, licensed under the Open Data Commons Open Database License (ODbL) by the OpenStreetMap Foundation (OSMF) (data in all 'my_examples' folders: subfolders '[coordinates]'>'auxiliary'>'closest_city.json'/'closest_street.json'). The documentation is licensed under the Creative Commons Attribution-ShareAlike 2.0 license (CC BY-SA 2.0).
The rest of the data is under a Creative Commons Attribution 4.0 International License. The data has been transformed following the code that can be found via this link: https://github.com/aedebus/Cam-ForestNet (in 'prepare_files').
Dataset Card for allenai/wmt22_african
Dataset Summary
This dataset was created based on metadata for mined bitext released by Meta AI. It contains bitext for 248 pairs for the African languages that are part of the 2022 WMT Shared Task on Large Scale Machine Translation Evaluation for African Languages.
How to use the data
There are two ways to access the data:
Via the Hugging Face Python datasets library
from datasets import load_dataset dataset =… See the full description on the dataset page: https://huggingface.co/datasets/allenai/wmt22_african.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Provided by NASA's Socioeconomic Data and Applications Centre. For details please visit http://sedac.ciesin.columbia.edu/data/set/groads-global-roads-open-access-v1/data-download
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset covers the studies which focused on investigating OER in Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population density per pixel at 100 metre resolution. WorldPop provides estimates of numbers of people residing in each 100x100m grid cell for every low and middle income country. Through ingegrating cencus, survey, satellite and GIS datasets in a flexible machine-learning framework, high resolution maps of population counts and densities for 2000-2020 are produced, along with accompanying metadata. DATASET: Alpha version 2010 and 2015 estimates of numbers of people per grid square, with national totals adjusted to match UN population division estimates (http://esa.un.org/wpp/) and remaining unadjusted. REGION: Africa SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the equator) PROJECTION: Geographic, WGS84 UNITS: Estimated persons per grid square MAPPING APPROACH: Land cover based, as described in: Linard, C., Gilbert, M., Snow, R.W., Noor, A.M. and Tatem, A.J., 2012, Population distribution, settlement patterns and accessibility across Africa in 2010, PLoS ONE, 7(2): e31743. FORMAT: Geotiff (zipped using 7-zip (open access tool): www.7-zip.org) FILENAMES: Example - AGO10adjv4.tif = Angola (AGO) population count map for 2010 (10) adjusted to match UN national estimates (adj), version 4 (v4). Population maps are updated to new versions when improved census or other input data become available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data management plan is developed to provide guidance on data management practices and standards for research institutions and teams working on Africa RISING program. The document is organized as follows:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The National Disaster inventory is a record of Natural Disasters including floods, thunderstorms, forest fires, mudslides and disease outbreaks etc. The inventory keeps track of the losses of life destruction of property and infrastructure, injury and displacement due to these incidents.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
General Description: This dataset contains a collection of African storybooks and folktales 📚, provided in PDF format 📄, collected to generate new stories through fine-tuning foundation models 🤖. The goal is to preserve and promote the richness of African narratives 🌍 while exploring the capabilities of text generation models 📝.
Origin and Context: The stories and folktales come from various oral and literary traditions 📖 across Africa, highlighting the cultural diversity 🌿 and the values they transmit ✨.
Data Structure: The dataset consists exclusively of PDF files 📑, with each file representing a book or a set of stories.
Intended Use: This dataset is ideal for fine-tuning text generation models 🔧, such as GPT or LLaMA, as well as for studying narrative structures in African storytelling 🎭.
Cultural Importance: The dataset helps to highlight African stories 🎨, often underrepresented, and opens up possibilities for creating new stories based on this heritage 🏛️.
Format: All files are in PDF format 📂, ready for use in text processing projects, with text extraction as needed 🛠️.
Point-of-interest (POI) is defined as a physical entity (such as a business) in a geo location (point) which may be (of interest).
We strive to provide the most accurate, complete and up to date point of interest datasets for all countries of the world. The South Africa POI Dataset is one of our worldwide POI datasets with over 98% coverage.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
POI Data is in a constant flux - especially so during times of drastic change such as the Covid-19 pandemic.
Every minute worldwide on an average day over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist.
In today's interconnected world, of the approximately 200 million POIs worldwide, over 94% have a public online presence. As a new POI comes into existence its information will appear very quickly in location based social networks (LBSNs), other social media, pictures, websites, blogs, press releases. Soon after that, our state-of-the-art POI Information retrieval system will pick it up.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via a recurring payment plan on our data update pipeline.
The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
The core attribute coverage is as follows:
Poi Field Data Coverage (%) poi_name 100 brand 8 poi_tel 67 formatted_address 100 main_category 98 latitude 100 longitude 100 neighborhood 1 source_url 43 email 8 opening_hours 47
The data may be visualized on a map at https://store.poidata.xyz/za and a data sample may be downloaded at https://store.poidata.xyz/datafiles/za_sample.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Summary
This dataset provides the most accurate and comprehensive geospatial information on wind turbines in South Africa as of 2025. It includes precise turbine coordinates, detailed technical attributes, and spatially harmonized metadata across 42 wind farms. The dataset contains 1,487 individual turbine entries with validated information on turbine type, rated capacity, rotor diameter, commissioning year, and administrative regions. It was compiled by integrating OpenStreetMap (OSM) data, satellite imagery from Google and Bing, a RetinaNet-based deep learning model for coordinate correction, and manual verification.
Data Structure
Format: GeoJSON
Coordinate Reference System (CRS): WGS 84 (EPSG:4326
)
Number of features: 1,487
Geometry type: Point (turbine locations)
Key attributes:
id
: Unique internal identifier
osm_id
: Reference ID from OpenStreetMap
gid
, country
, type1
, name1
, type2
, name2
: Administrative region (based on GADM)
farm_name
: Name of the wind farm
commissioning_year
: Year the turbine was commissioned
number_of_turbines
: Total number of turbines at the wind farm
total_farm_capacity
: Total installed capacity of the wind farm (MW)
capacity_per_turbine
: Rated power per turbine (MW)
turbine_type
: Manufacturer and model of the turbine
geometry
: Point geometry (longitude, latitude)
Publication Abstract
Accurate and detailed spatial data on wind energy infrastructure is essential for renewable energy planning, grid integration, and system analysis. However, publicly available datasets often suffer from limited spatial accuracy, missing attributes, and inconsistent metadata. To address these challenges, this study presents a harmonized and spatially refined dataset of wind turbines in South Africa, combining OpenStreetMap data with high-resolution satellite imagery, deep learning-based coordinate correction, and manual curation. The dataset includes 1,487 turbines across 42 wind farms, representing over 3.9 GW of installed capacity as of 2025. The Geo-Coordinates were validated and corrected using a RetinaNet-based object detection model applied to both Google and Bing satellite imagery. Instead of relying solely on spatial precision, the curation process emphasized attribute completeness and consistency. Through systematic verification and cross-referencing with multiple public sources, the final dataset achieves a high level of attribute completeness and internal consistency across all turbines, including turbine type, rated capacity, and commissioning year. The resulting dataset is the most accurate and comprehensive publicly available dataset on wind turbines in South Africa to date. It provides a robust foundation for spatial analysis, energy modeling, and policy assessment related to wind energy development.
Citation Notification
If you use this dataset, please cite the following publication (currently in the process of publication):
Kleebauer, M. et. al (2025). A Wind Turbines Dataset for South Africa: OpenStreetMap Data, Deep Learning Based Geo-Coordinate Correction and Capacity Analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Persistent Identifiers (PIDs), particularly Digital Object Identifiers (DOIs), are crucial for establishing a robust and globally accessible research infrastructure. In the Middle East and North Africa (MENA) region, a diverse array of research outputs and resources are produced and published in repositories. However, a significant number of these repositories, and outputs remain undiscoverable in global registries and aggregators.
These three datasets provides comprehensive information on the adoption of repositories, Open Access mandates, and DOIs adoption in MENA countries. It includes detailed records from different registry sources and repository platforms.
You can read the full report titled 'Mapping Repositories and their Institutional Open Science Policies in MENA' at https://doi.org/10.5281/zenodo.11370031
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structured interviews were conducted with researchers from the Comprehensive Open Distance eLearning institutions to examine current data curation practices. The study aimed to identify strategies that improve the discoverability and accessibility of research data submitted in the research data repository. The visibility of research output is crucial for academic recognition and the advancement of knowledge, as well as for complying with funder requirements to make provisions for data reuse and enable actionable and socially beneficial open science from publicly funded research projects. The visibility of research output is crucial for academic recognition and the advancement of knowledge, as well as for complying with funder requirements to make provisions for data reuse and enable actionable and socially beneficial open science from publicly funded research projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Shape files on Morocco's administrative regions, population, infrastructure and in-country water bodies
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about companies in South Africa. It has 19,267 rows. It features 30 columns including city, country, employees, and employee type.
Authors:
D. Lee, W. Anderson, X. Chen, F. Davenport, S. Shukla, R. Sahajpal, M. Budde, J. Rowland, J. Verdin, L. You, M. Ahouangbenon, K. Davis, E. Kebede, S. Ehrmann, C. Justice, and C. Meyer
Publication:
Scientific Data (in revision); preprint available at EarthArXiv.
Donghoon Lee
Department of Civil Engineering, University of Manitoba, Winnipeg, Manitoba, Canada
Email: Donghoon.Lee@umanitoba.ca
Weston Anderson
Earth System Science Interdisciplinary Center, University of Maryland, College Park, Maryland, USA
Email: Weston@umd.edu
Xuan Chen
Earth System Science Interdisciplinary Center, University of Maryland, College Park, Maryland, USA
Email: X.Chen@cgiar.org
Frank Davenport
Climate Hazards Center, Depar...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: This dataset has been updated with transmission lines for the MENA region. This is the most complete and up-to-date open map of Africa's electricity grid network. This dataset serves as an updated and improved replacement for the Africa Infrastructure Country Diagnostic (AICD) data that was published in 2007. Coverage This dataset includes planned and existing grid lines for all continental African countries and Madagascar, as well as the Middle East region. The lines range in voltage from sub-kV to 700 kV EHV lines, though there is a very large variation in the completeness of data by country. An interactive tool has been created for exploring this data, the Africa Electricity Grids Explorer. Sources The primary sources for this dataset are as follows: Africa Infrastructure Country Diagnostic (AICD) OSM © OpenStreetMap contributors For MENA: Arab Union of Electricity and country utilities. For West Africa: West African Power Pool (WAPP) GIS database World Bank projects archive and IBRD maps There were many additional sources for specific countries and areas. This information is contained in the files of this dataset, and can also be found by browsing the individual country datasets, which contain more extensive information. Limitations Some of the data, notably that from the AICD and from World Bank project archives, may be very out of date. Where possible this has been improved with data from other sources, but in many cases this wasn't possible. This varies significantly from country to country, depending on data availability. Thus, many new lines may exist which aren't shown, and planned lines may have completely changed or already been constructed. The data that comes from World Bank project archives has been digitized from PDF maps. This means that these lines should serve as an indication of extent and general location, but shouldn't be used for precisely location grid lines.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor Landscape Dynamics (landDX) an open-access spatial-temporal database for the Kenya-Tanzania borderlands. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The political map of Africa shows national boundaries and country names.
This dataset was obtained from the National Household Travel Survey. Due the volume of the data, it was divided in two. This dataset shows the time spent and the travel mode to go to the study place all over the country