Competition happening at DrivenData right NOW!!! but let's train your model here and benefit from Kaggle kernel GPU XD
Data Source: https://www.drivendata.org/competitions/46/box-plots-for-education-reboot/page/86/
Can you predict which water pumps are faulty?
Using data from Taarifa and the Tanzanian Ministry of Water, can you predict which pumps are functional, which need some repairs, and which don't work at all? This is an intermediate-level practice competition. Predict one of these three classes based on a number of variables about what kind of pump is operating, when it was installed, and how it is managed. A smart understanding of which waterpoints will fail can improve maintenance operations and ensure that clean, potable water is available to communities across Tanzania.
The data for this comeptition comes from the Taarifa waterpoints dashboard, which aggregates data from the Tanzania Ministry of Water.
In their own words:
Taarifa is an open source platform for the crowd sourced reporting and triaging of infrastructure related issues. Think of it as a bug tracker for the real world which helps to engage citizens with their local government. We are currently working on an Innovation Project in Tanzania, with various partners.
Your goal is to predict the operating condition of a waterpoint for each record in the dataset. You are provided the following set of information about the waterpoints:
amount_tsh - Total static head (amount water available to waterpoint)
date_recorded- The date the row was entered
funder - Who funded the well
gps_height - Altitude of the well
installer - Organization that installed the well
longitude - GPS coordinate
latitude - GPS coordinate
wpt_name - Name of the waterpoint if there is one
num_private -is it private
basin - Geographic water basin
subvillage - Geographic location
region - Geographic location
region_code - Geographic location (coded)
district_code - Geographic location (coded)
lga - Geographic location
ward - Geographic location
population - Population around the well
public_meeting- True/False
recorded_by - Group entering this row of data
scheme_management - Who operates the waterpoint
scheme_name- Who operates the waterpoint
permit- If the waterpoint is permitted
construction_year - Year the waterpoint was constructed
extraction_type - The kind of extraction the waterpoint uses
extraction_type_group - The kind of extraction the waterpoint uses
extraction_type_class- The kind of extraction the waterpoint uses
management- How the waterpoint is managed
management_group - How the waterpoint is managed
payment - What the water costs
payment_type - What the water costs
water_quality - The quality of the water
quality_group - The quality of the water
quantity - The quantity of water
quantity_group - The quantity of water
source- The source of the water
source_type - The source of the water
source_class - The source of the water
waterpoint_type - The kind of waterpoint
waterpoint_type_group - The kind of waterpoint
This Data is taken from the Driven Data website Link to the competition : https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/25/
Can you build a model which can help predict the pump breakdown and also add meaning to some one's life.
This dataset was developed as part of a challenge to segment building footprints from aerial imagery. The goal of the challenge was to accelerate the development of more accurate, relevant, and usable open-source AI models to support mapping for disaster risk management in African cities [Read more about the challenge]. The data consists of drone imagery from 10 different cities and regions across Africa
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is extracted from a data science contest held on DrivenData.org (link here).
The dataset contains informations about water pumps located in Tanzania : geography, operating state, installation method, fundings, ...
Can you predict which water pumps are faulty?
Using data from Taarifa and the Tanzanian Ministry of Water, can you predict which pumps are functional, which need some repairs, and which don't work at all? This is an intermediate-level practice competition. Predict one of these three classes based on a number of variables about what kind of pump is operating, when it was installed, and how it is managed. A smart understanding of which waterpoints will fail can improve maintenance operations and ensure that clean, potable water is available to communities across Tanzania.
Hosted By NGA https://www.drivendata.org/competitions/78/overhead-geopose-challenge/data/
In many uses of multispectral satellite imagery, clouds obscure what we really care about - for example, tracking wildfires, mapping deforestation, or monitoring crop health. Being able to more accurately remove clouds from satellite images filters out interference, unlocking the potential of a vast range of use cases. With this goal in mind, this training dataset was generated as part of crowdsourcing competition, and later on was validated using a team of expert annotators. The dataset consists of Sentinel-2 satellite imagery and corresponding cloudy labels stored as GeoTiffs. There are 22,728 chips in the training data, collected between 2018 and 2020.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Minimum Speed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This MATLAB code is part of the study titled "Joint Image Processing with Learning-Driven Data Representation and Model Behavior for Non-Intrusive Anemia Diagnosis in Pediatric Patients", which has been accepted for publication in the Journal of Imaging (MDPI). The code supports image processing, feature extraction, and deep learning model training (including LSTM and RexNet) to classify pediatric patients as anemic or non-anemic based on palm, conjunctival, and fingernail images. Full study details are available in this paper:
Berghout T. Joint Image Processing with Learning-Driven Data Representation and Model Behavior for Non-Intrusive Anemia Diagnosis in Pediatric Patients. Journal of Imaging. 2024; 10(10):245. https://doi.org/10.3390/jimaging10100245
The datsets use in this work are:
Asare, J. W., Appiahene, P. & Donkoh, E. (2022). Anemia Detection using Palpable Palm Image Datasets from Ghana. Mendeley Data. https://doi.org/10.17632/ccr8cm22vz.1
Asare, J. W., Appiahene, P. & Donkoh, E. (2023). CP-AnemiC (A Conjunctival Pallor) Dataset from Ghana. Mendeley Data. https://doi.org/10.17632/m53vz6b7fx.1
Asare, J. W., Appiahene, P. & Donkoh, E. (2020). Detection of Anemia using Colour of the Fingernails Image Datasets from Ghana. Mendeley Data. https://doi.org/10.17632/2xx4j3kjg2.1
Basic information is lacking about many pastoralist areas in the world. As a result, many services, programmes and policies do not effectively address the needs of pastoralist communities. The Government Cooperative Programme (GCP) project GCP/GLO/779/IF “Pastoralists-driven Data Management System”, was based on the idea that pastoralist associations could themselves collect, manage and share data from among their communities. This information could then be used to advocate for better targeted and pastoralist-friendly policies at local, national and international level. The project aimed at strengthening the capacities of pastoral organizations in data collection, analysis and information management, in order to facilitate evidence-based policy decision-making. It was implemented in Argentina, Chad and Mongolia, managed by the Pastoralist Knowledge Hub (PKH), and supported by the Agricultural Research Centre for International Development (Centre de coopération internationale en recherche agronomique pour le développement - CIRAD).
In Chad, the project was implemented by the Billital Maroobe Network (Réseau Billital Maroobé - RBM). An innovative approach for collecting data was developed through close partnership among the stakeholders involved, and was adopted during two successive surveys. The two questionnaires for collecting data on pastoralism were discussed and adapted to the national context, through the contribution of the participants and their deep knowledge of the field. This was one of the most innovative and successful aspects of the project, i.e. the pertinence of the method, as a result of the proactive involvement of the beneficiaries. The first survey, which aimed to identify and describe the pastoralist population, gathered information on 8,938 households. The second survey, which was more in-depth and aimed to assess the pastoralist economy and its contribution to the national economies, was conducted on a sample (based on the results of the first survey) of 1,010 households. As well as demonstrating that pastoralist organizations had the potential to successfully manage data, the surveys revealed the actual contribution of pastoralism to the economy of the country. In particular, they showed that pastoralism contributed to the national economy more than studies usually indicated, as, owing to specific characteristics, such as high levels of self-consumption, pastoralists' contribution to Gross Domestic Product (GDP) was often underestimated . During the project, it emerged that pastoralism could contribute up to 27 percent to the GDP of Chad.
National coverage
Households
Pastoralist Households
Sample survey data [ssd]
The first survey, which aimed to identify and describe the pastoralist population, gathered information on 8,938 pastoralist households in Chad. The second survey, which was more in-depth and aimed to assess the pastoralist economy and its contribution to the national economy, was conducted on a sample (based on the results of the first survey) of 1,010 pastoralist households.
The target regions for the second survey were originally 15, out of a total of 23 regions. However, owing to unforeseen constraints, only 10 regions were covered.
Computer Assisted Personal Interview [capi]
The survey was conducted in 2 rounds. For the first round, a short questionnaire was submitted to a representative of each household, addressing the following topics: i) households' socio-demographic characteristics; ii) livestock numbers and ownership; iii) land tenure and access; and iv) water access and use.
For the second round, the questionnaire focussed on the economic activity of pastoralists and their contribution to the national GDP. It covers the following topics: i) household identification ii) socio-demographic characteristics iii) livestock herd composition iv) products and final destination v) agricultural production, fishing and hunting activity vi) income and sales vii) household expenses viii) shock and adaptation strategies.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by JonathanWhitaker
Released under CC0: Public Domain
Basic information is lacking about many pastoralist areas in the world. As a result, many services, programmes and policies do not effectively address the needs of pastoralist communities. The Government Cooperative Programme (GCP) project GCP/GLO/779/IF “Pastoralists-driven Data Management System”, was based on the idea that pastoralist associations could themselves collect, manage and share data from among their communities. This information could then be used to advocate for better targeted and pastoralist-friendly policies at local, national and international level. The project aimed at strengthening the capacities of pastoral organizations in data collection, analysis and information management, in order to facilitate evidence-based policy decision-making. It was implemented in Argentina, Chad and Mongolia, managed by the Pastoralist Knowledge Hub (PKH), and supported by the Centre de coopération internationale en recherche agronomique pour le développement (Agricultural Research Centre for International Development [CIRAD]).
In Mongolia, the project was implemented by the National Federation of Pastoralist User Groups. An innovative approach for collecting data was developed through close partnership among the stakeholders involved, and was adopted during two successive surveys. The two questionnaires for collecting data on pastoralism were discussed and adapted to the national context, through the contribution of the participants and their deep knowledge of the field. This was one of the most innovative and successful aspects of the project, i.e. the pertinence of the method, as a result of the proactive involvement of the beneficiaries. The first survey, which aimed to identify and describe the pastoralist population, gathered information on 112,957 households. The second survey, which was more in-depth and aimed to assess the pastoralist economy and its contribution to the national economies, was conducted on a sample (based on the results of the first survey) of 1,938 households. As well as demonstrating that pastoralist organizations had the potential to successfully manage data, the surveys revealed the actual contribution of pastoralism to the economy of the country. In particular, they showed that pastoralism contributed to the national economy more than studies usually indicated, as, owing to specific characteristics, such as high levels of self-consumption, pastoralists' contribution to Gross Domestic Product (GDP) was often underestimated . During the project, it emerged that pastoralism could contribute up to 12 percent to the GDP of Mongolia.
National coverage
Households
Pastoralist Households.
Sample survey data [ssd]
The first survey, which aimed to identify and describe the pastoralist population, gathered information on 112,957 households in Mongolia, from different aimags.
With regard to the second survey, 1,938 pastoralist households from the 18 aimags were targeted, based on statistical requirements, as advised by CIRAD. To select the sample households, the NFPUG used maps created from the Global Positioning System (GPS) data collected through the first survey. The sample was made up of four different groups/types of households, based on their animal numbers. This survey involved a smaller number of collectors, only the aimag and sum leaders were involved, and the former gave paper-based questionnaires to the latter, to gather data from after the completed interviews and enter into the Open Foris Collect server. Each collector interviewed 10-15 households, and no more than one per day in areas such as the Gobi Desert, where households lived far apart.
For the first survey, out of the 159,219 targeted households at the beginning, 112,957 interviews were completed.
Face-to-face [f2f]
The survey was conducted in 2 rounds. For the first round, a short questionnaire was submitted to a representative of each household, addressing the following topics: i) households' socio-demographic characteristics; ii) livestock numbers and ownership; iii) land tenure and access; and iv) water access and use.
For the second round, the questionnaire focussed on the economic activity of pastoralists and their contribution to the national GDP. It covers the following topics: i) household identification ii) socio-demographic characteristics iii) livestock herd composition iv) products and final destination v) agricultural production, fishing and hunting activity vi) income and sales vii) household expenses viii) shock and adaptation strategies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for research title: Integration of NLP, AI-Driven Data Analysis, Risk Assessment, and Electronic Whistle-Blowing Systems in Fraud Detection
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In data-centric industries, such as finance, healthcare, and cybersecurity, maintaining the integrity and accuracy of data lineage is crucial due to compliance requirements. Current methods for verifying data lineage often struggle with the dynamic and multi-sourced nature of datasets, as well as their scale, which leads to subpar performance in detecting anomalies or validating lineage. This study introduces a new, AI-based framework designed to enhance the verification of data lineage through time-centred analysis.
As part of the global initiative to support developing countries in their quest for greater information sharing about development cooperation flows, the Global Partnership for Effective Development Cooperation (GPEDC) was established at the Fourth High-Level Forum on Aid Effectiveness in Busan in 2011. In the 2014 GPEDC progress report, eleven partner countries reported on Chinese financial flows for the first time, a significant increase from previous years. These countries include Cambodia, Democratic Republic of Congo (DRC), Madagascar, Mali, Moldova, Nepal, Philippines, Samoa, Senegal, Tajikistan, and Togo. Furthermore, the report provides information on these countries' public financial management systems, and the extent of their respective mutual accountability frameworks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Visual datasets taken from event-cameras on the iCub as well as any other relevant sensors (e.g. encoder values). Each datasets contains a readme for file description and use. The datasets are:1. EDPR_DVS_BALLTRACKING - 2 small ball tracking datasets2. EDPR_DVS_CORNERDETECTION - several datasets for evaluating corner detection in structured, and unstructured scenes.3. VVV18-EVENTDRIVEN-DATASET - three datasets with ATIS camera, frame-based camera and robot encoder positions. The robots commanded positions as used by yarpmotorgui are also included.
Basic information is lacking about many pastoralist areas in the world. As a result, many services, programmes and policies do not effectively address the needs of pastoralist communities. The Government Cooperative Programme (GCP) project GCP/GLO/779/IF “Pastoralists-driven Data Management System”, was based on the idea that pastoralist associations could themselves collect, manage and share data from among their communities. This information could then be used to advocate for better targeted and pastoralist-friendly policies at local, national and international level. The project aimed at strengthening the capacities of pastoral organizations in data collection, analysis and information management, in order to facilitate evidence-based policy decision-making. It was implemented in Argentina, Chad and Mongolia, managed by the Pastoralist Knowledge Hub (PKH), and supported by the Centre de coopération internationale en recherche agronomique pour le développement (Agricultural Research Centre for International Development [CIRAD]).
In Argentina, the project was implemented by the Gran Chaco Foundation (Fundación Gran Chaco). An innovative approach for collecting data was developed through close partnership among the stakeholders involved, and was adopted during two successive surveys. The two questionnaires for collecting data on pastoralism were discussed and adapted to the national contexts, through the contribution of the participants and their deep knowledge of the field. This was one of the most innovative and successful aspects of the project, i.e. the pertinence of the method, as a result of the proactive involvement of the beneficiaries. The first survey, which aimed to identify and describe the pastoralist population, gathered information on 7,151 households. The second survey, which was more in-depth and aimed to assess the pastoralist economy and its contribution to the national economies, was conducted on a sample (based on the results of the first survey) of 1,195 households. As well as demonstrating that pastoralist organizations had the potential to successfully manage data, the surveys revealed the actual contribution of pastoralism to the economy of the country. In particular, they showed that pastoralism contributed to the national economy more than studies usually indicated, as, owing to specific characteristics, such as high levels of self-consumption, pastoralists' contribution to Gross Domestic Product (GDP) was often underestimated . During the project, it emerged that pastoralism could contribute up to 12 percent to the GDP of Argentina.
National coverage
Households
Pastoralist Households
Sample survey data [ssd]
The first survey, which aimed to identify and describe the pastoralist population, gathered information on 6,532 households in Argentina. The first survey was based on primary data collection in the field by volunteers and existing smaller databases, which had previously been created by grassroots organizations on the occasion of extraordinary events, such as droughts. The participant collectors were around 35, who used the Open Foris Collect mobile application through tablets or smartphones to interview the households. Each volunteer conducted varying number of interviews, from 30 to 200, depending on their possibilities, which resulted in a total of 6,532 interviewed households.
The second survey, which was more in-depth and aimed to assess the pastoralist economy and its contribution to the national economies, was conducted on a sample (based on the results of the first survey) of 1,198 households in Argentina.
Computer Assisted Personal Interview [capi]
The survey was conducted in 2 rounds. For the first round, a short questionnaire was submitted to a representative of each household, addressing the following topics: i) households' socio-demographic characteristics; ii) livestock numbers and ownership; iii) land tenure and access; and iv) water access and use.
For the second round, the questionnaire focussed on the economic activity of pastoralists and their contribution to the national GDP. It covers the following topics: i) household identification ii) socio-demographic characteristics iii) livestock herd composition iv) products and final destination v) agricultural production, fishing and hunting activity vi) income and sales vii) household expenses viii) shock and adaptation strategies.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Competition happening at DrivenData right NOW!!! but let's train your model here and benefit from Kaggle kernel GPU XD
Data Source: https://www.drivendata.org/competitions/46/box-plots-for-education-reboot/page/86/