Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Project Climate Change, Health, and Artificial Intelligence (Project CCHAIN) dataset is a validated, open-sourced linked dataset containing 20 years (2003-2022) of climate, environmental, socioeconomic, and health dimensions at the barangay (village) level across twelve Philippine cities (Dagupan, Palayan, Navotas, Mandaluyong, Muntinlupa, Legazpi, Iloilo, Mandaue, Tacloban, Zamboanga, Cagayan de Oro, Davao). The full documentation can be accessed here.
The tables are designed in a way that users can choose variables that are most relevant to their focus city and use case, and link these variables to form a single dataset by merging using standard geography codes and calendar dates. This can be done using the provided linking notebook, or offline using the user's own code.
Here are some tips on how make most use of this dataset:
- Focus on one location. Starting with a detailed analysis of one location allows for a better understanding of the local dynamics, which may differ across locations.
- Choose one health data source. Pick one of either a central or local data source. Using two different data health sources is not advised because it will lead to double/overcounting of disease cases.
- Do not use all variables at once- do a literature review first to identify possible key variables to identify possible key variables. More often than not, using all variables is not necessary and may even yield subpar results.
- Decide whether or not to use regular or downscaled climate data. Our downscaled climate data provides nuanced insights on spatial patterns of a few climate variables. Kindly read the documentation before deciding to use this data. If you are uncertain, consider using only the climate_atmosphere table instead
- Check data availability on your focus location and make sure they fit the requirements of your study.
This dataset also includes household surveys tables (see schema here and here) done on partner informal settlement communities in the cities of Muntinlupa, Davao, Iloilo, and Mandaue and administered on various dates from 2001 to 2024. Due to the sensitive nature of surveys and the vulnerability of the subjects involved, requests for access must be submitted for review and approval by the Philippine Action for Community-Led Shelter Initiatives, Inc. (PACSII). To submit a request, please use this form.
The Project CCHAIN dataset adapted the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows anyone to share (copy and redistribute) and adapt (remix, transform, and build upon) a work, as long as they give appropriate credit to the original creator.
One exception, the tm_open_buildings table, follows the Open Database License (ODbL) as directed by its source, OpenStreetMap. Under the ODbL, users are free to use, modify, and distribute the database, but on top of CC BY 4.0's attribution requirement, this license requires to share any modifications they make under the same ODbL license.
Facebook
TwitterWebsite link to get more datasets: https://power.larc.nasa.gov/
The NASA POWER Project provides a wealth of data to support various applications related to energy, climate, and agriculture. One of the key datasets provided by the project is temperature data, which offers valuable insights into regional and global temperature patterns and trends. The temperature datasets are generated using advanced satellite remote sensing technologies and cover a wide range of spatial and temporal scales, from daily to monthly, and from local to global.
The temperature data sets provided by the POWER Project have a number of uses. For example, they can be used to monitor and analyze the impacts of climate change on the planet, and to understand how changes in temperature are affecting ecosystems and the distribution of plant and animal species. They can also be used to inform energy planning and management decisions, such as the design and operation of renewable energy systems and building energy efficiency measures. The temperature data sets are also useful for agricultural planning and management, providing critical information on crop growth, water usage, and other factors that impact food production and food security.
The temperature datasets from the NASA POWER Project are freely available to researchers, policymakers, and the general public, making them an important resource for anyone interested in the impacts of climate change and the use of renewable energy. Whether you're looking to understand the changing climate of our planet, plan and manage sustainable energy systems, or to ensure food security, the temperature datasets from the POWER Project are a valuable resource that can help you make informed decisions.
Facebook
TwitterCompilation of Earth Surface temperatures historical. Source: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data
Data compiled by the Berkeley Earth project, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.
In this dataset, we have include several files:
Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
**Other files include: **
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
%3C!-- --%3E
The raw data comes from the Berkeley Earth data page.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a long-term record of global forest fires between 1881 and 2025. It includes both wildfire information and related climate variables, making it useful for:
Exploratory Data Analysis (EDA)
Machine Learning projects
Time-series analysis of wildfire frequency
Studying the relationship between climate change and wildfires
🌍 Dataset Features:
Covers multiple countries and regions worldwide
Includes historical and recent fire events
Captures environmental factors influencing fire behavior
Researchers, data scientists, and students can use this dataset to analyze wildfire patterns, predict future risks, and explore the impact of climate change on global fire activity.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Overview: This dataset offers a comprehensive collection of Daily weather readings from major cities around the world. In the first release, it included only capitals, but now it also adds main cities worldwide and hourly data as well, making up to ~1250 cities. Some locations provide historical data tracing back to January 2, 1833, giving users a deep dive into long-term weather patterns and their evolution.
Data License and Updates: This dataset is updated every Sunday using data from Meteostat API, ensuring access to the latest week's data without overburdening the data source.
cities.csv)This dataframe offers details about individual cities and weather stations.
- Columns:
- station_id: Unique ID for the weather station.
- city_name: Name of the city.
- country: The country where the city is located.
- state: The state or province within the country.
- iso2: The two-letter country code.
- iso3: The three-letter country code.
- latitude: Latitude coordinate of the city.
- longitude: Longitude coordinate of the city.
countires.csv)This dataframe contains information about different countries, providing insights into their geographic and demographic characteristics.
- Columns:
- iso3: The three-letter code representing the country.
- country: The English name of the country.
- native_name: The native name of the country.
- iso2: The two-letter code representing the country.
- population: The population of the country.
- area: The total land area of the country in square kilometers.
- capital: The name of the capital city.
- capital_lat: The latitude coordinate of the capital city.
- capital_lng: The longitude coordinate of the capital city.
- region: The specific region within the continent where the country is located.
- continent: The continent to which the country belongs.
- hemisphere: The hemisphere in which the country is located (e.g., Northern, Southern).
daily_weather.parquet)This dataframe provides weather data on a daily basis.
- Columns:
- station_id: Unique ID for the weather station.
- city_name: Name of the city where the station is located.
- date: Date of the weather record.
- season: Season corresponding to the date (e.g., summer, winter).
- avg_temp_c: Average temperature in Celsius.
- min_temp_c: Minimum temperature in Celsius.
- max_temp_c: Maximum temperature in Celsius.
- precipitation_mm: Precipitation in millimeters.
- snow_depth_mm: Snow depth in millimeters.
- avg_wind_dir_deg: Average wind direction in degrees.
- avg_wind_speed_kmh: Average wind speed in kilometers per hour.
- peak_wind_gust_kmh: Peak wind gust in kilometers per hour.
- avg_sea_level_pres_hpa: Average sea-level pressure in hectopascals.
- sunshine_total_min: Total sunshine duration in minutes.
These dataframes can be utilized for various analyses such as weather trend prediction, climate studies, geographic analysis, demographic insights, and more.
Dataset Image Source: Photo credits to 越过山丘. View the original image here.
Facebook
TwitterCredit to the original author: The dataset was originally published here
Hands-on teaching of modern machine learning and deep learning techniques heavily relies on the use of well-suited datasets. The "weather prediction dataset" is a novel tabular dataset that was specifically created for teaching machine learning and deep learning to an academic audience. The dataset contains intuitively accessible weather observations from 18 locations in Europe. It was designed to be suitable for a large variety of different training goals, many of which are not easily giving way to unrealistically high prediction accuracy. Teachers or instructors thus can chose the difficulty of the training goals and thereby match it with the respective learner audience or lesson objective. The compact size and complexity of the dataset make it possible to quickly train common machine learning and deep learning models on a standard laptop so that they can be used in live hands-on sessions.
The dataset can be found in the `\dataset` folder and be downloaded from zenodo: https://doi.org/10.5281/zenodo.4980359
If you make use of this dataset, in particular if this is in form of an academic contribution, then please cite the following two references:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides valuable insights into agricultural productivity across various states in India from 1990 to 2024. It includes data on crop yield, fertilizer consumption, annual rainfall, irrigation area, cropping intensity, agricultural credit, maximum temperature, and gross sown area over a span of 31 years. This information is useful for researchers, policymakers, and agricultural analysts aiming to understand factors affecting crop yields and make informed decisions.
| Column Name | Description |
|---|---|
| Year | The year of the observation (e.g., 1990, 1991). |
| State | The state where the data was collected (e.g., Andhra, Karnataka). |
| Yield_per_hectare | The yield per hectare categorized into ranges (e.g., 503.35 - 1103.52). |
| Fertilizer_consp | The fertilizer consumption categorized into ranges (e.g., 0.60 - 50.00). |
| AnnualRainfall | The annual rainfall categorized into ranges (e.g., 201.51 - 553.25). |
| Gross_irrigated_area | The gross irrigated area categorized into ranges (e.g., 38.00 - 1876.60). |
| Cropping_intensity | The cropping intensity categorized into ranges (e.g., 100.00 - 121.29). |
| Agri_credit | The agricultural credit categorized into ranges (e.g., 0.00 - 198.05). |
| MaxTemp | The maximum temperature categorized into ranges (e.g., -3.30 - -0.15). |
| Gross_sown_area | The gross sown area categorized into ranges (e.g., 187.00 - 3729.80). |
This dataset can be utilized for:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is used for research in climate projects, with variables that support the development of machine learning models by highlighting environmental risks and mitigation strategies.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains dummy 20,000 daily weather records. Each entry includes the date, temperature in three units (Celsius, Kelvin, and Fahrenheit), precipitation in millimeters, and wind speed in kilometers per hour. The data can be used to study weather trends, analyze temperature conversions, or build predictive models for rainfall or wind conditions. It's suitable for climate analysis, time-series forecasting, and educational projects focused on meteorological data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
As we know, the global climate disaster makes its impact on us more felt day by day. Understanding the parameters created by the climate crisis will be helpful in deciding the measures we will take against it.
In this dataset, you will see the natural disasters of all countries.
EOSDIS SYSTEM
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries.
Over 9000 stations' data are typically available.
The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches)
Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and present, collected from over 9000 stations. Dataset Source: NOAA
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Photo by Allan Nygren on Unsplash
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Where should we live in the next 10 years? Where should we settle down without relying on public transport? Which city should we move to without fearing losing our homes?
As weather patterns become more unpredictable with aggressive changes in temperatures, I collected some data below to see if there would be a city that could help assess our answers to the prior questions. I am curious to see if cities that typically have great infrastructure for walking, biking or public transit will be better prepared than those that are more typically car centric. Whichever you prefer, we can have a sense on where you might be migrating, and to which areas.
Here's how the data was collected:
The columns have different rating systems. The counties have all major climate risks expected in the future, while corresponding cities in each county have walking, transit and biking scores to assess livability without cars.
Understanding County Climate Risks The counties were were represented on a 1- 10 scale, based on RCP 8.5 levels. Here are the following explanations (0 = lowest, 10 = highest)
1) Heat: Heat is one of the largest drivers changing the niche of human habitability. Rhodium Group researchers estimate that, between 2040 and 2060 extreme temperatures, many counties will face extremely high temperatures for half a year. The measure shows how many weeks per year will we anticipate temperatures to soar above 95 degrees. (0 = 0 weeks, 10 = 26 weeks).
2) Wet Bulb: Wet bulb temperatures occur when heat meets excessive humidity. This is commonplace across cities that have a urban island heat effects (dense concentration of pavements, less nature, higher chances of absorbing heat). That combination creates wet bulb temperatures, where 82 degrees can feel like southern Alabama on its hottest day, making it dangerous to work outdoors and for children to play school sports. As wet bulb temperatures increase even higher, so will the risk of heat stroke — and even death. The measure shows how many days will a county experience high wet bulb temperatures yearly, from 2040 to 2060. (0 = 0 days, 10 = 70 days)
3) Farm Crop Yield: With rising temperatures, it will become more difficult to grow food. Corn and soy are the most prevalent crops in the U.S. and the basis for livestock feed and other staple foods, and they have critical economic significance. Because of their broad regional spread, they offer the best proxy for predicting how farming will be affected by rising temperatures and changing water supplies. As corn and soy production gets more sensitive to heat than drought, the US will see a huge continental divide between cooler counties now having more ability to produce, while current warmer counties loosing all abilities to produce basic crops. The expected measure shows the percent decline yields from 2040 to 2060 (0 = -20.5% decline, 10 = 92% decline).
4) Sea Level Rise: As sea levels rise, the share of property submerged by high tides increases dramatically, affecting a small sliver of the nation's land but a disproportionate share of its population. The rating measures how much of property in the county will go below high tide from 2040 to 2060 (0 = 0%, 10 = 25%).
5) Very Large Fires: With heat and evermore prevalent drought, the likelihood that very large wildfires (ones that burn over 12,000 acres) will affect U.S. regions increases substantially, particularly in the West, Northwest and the Rocky Mountains. The rating calculates how many average number of large fires will we expect to see per year (0 = N/A, 10 = 2.45) from 2040 to 2071.
6) Economic Damages: Rising energy costs, lower labor productivity, poor crop yields and increasing cr...
Facebook
TwitterThe Environmental Assessment Service (or SEA as its name in Spanish) is the institution responsible for authorizing the operation of projects in Chile, which could have potential impacts in the population health or the environment. When a company wants to carry out a project of a relatively large magnitude, it should present a requirement to the SEA to evaluate the correct and safety operation of that project. In this way if a project is detectable as harmful to the environment or population, the service can deny the environmental permit and thereby the start-up of a project. From the starting point of SEA in 1997, more than 15 thousand of projects have been evaluated by this service, thereby the database of SEA contains a large number of registers, which can be useful to analyze.
The data presented in this page was scraped from the SEA page leveraging this is public information. The script used to get data collects general information about the projects evaluated by the SEA.
The data set is composed by a data frame with the projects presented to the SEA. The columns of this data frame consider the next fields:
name: name of projecttype: type of evaluation process.Projects can present a environmental impact statement (DIA in spanish) or a environmental impact study (EIA). This depend on the magnitude of potential impacts to the environment or health population. DIA means a simple evaluation of impacts, while EIA corresponds to a more complex assessmentregion: region where the project is carrying outtypology: kind of projects based on its sectortypology_descr: a description of the typologyinvestment: investment amount in USDentry_date: date where the project enters to the SEA processstate: the current state of the project' evaluationqualification_date: date where the final SEA' resolution was issued. This resolution considers options as approved or denied. No all projects are qualified due to several are withdrawn earlierid_project: id of each project inSEAlatitude: latitude in degrees using Datum WGS84. These coordinates are validates by SEAlongitude: longitude in degrees using Datum WGS84. These coordinates are validates by SEAn_docs: number of documents available through evaluation processn_addendum: number of addenda done in the evaluation process of a projectn_participatory_act: number of participatory activities done in the evaluation process of a projectdescription: general description of the projectmain_url: url of the evaluation process of a projectFor the moment, the content of data frame is in Spanish, but in the future this will be translate to English. Data can contain mistakes due different factors. It is encouraged that people can detect and mention these problems in discussion section. Some mistakes are detected, specially in description field, however these mistakes are from the web page.
Scraping was done on 15 March 2021 and the script utilized in available in Github.
With this data, different analysis can be done. Based on this information, you can know which different kind of projects have been approved or which are been denied. Some productive sectors are more presented in the country than others. In addition, you can analyze projects based on the investment amount or in their locations on the country. Also, you can observe that some places are saturated by certain productive sectors, which can affect the health of the population or the environment due to the number of projects concentrated in specific regions. For instance, in Chile is known that salmon farmings have caused several impacts in the south of the country, while that problems of air pollution have been evidenced in Quintero, where repeated peaks of SO2 have affected the health of people. In both cases, a core of industries are associated to these issues.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description: Environmental Sensor Readings from Mars Rover Prototype
Research Hypothesis A scaled-down Mars Rover prototype can effectively collect temperature and humidity data, demonstrating how real-time environmental monitoring can be used for autonomous navigation, climate analysis, and anomaly detection.
By analyzing the collected data, we aim to identify trends, evaluate sensor accuracy, and explore potential improvements in robotic exploration. This includes assessing response time, consistency, and anomalies caused by external factors like human interference or sudden environmental changes.
What the Data Shows
This dataset contains timestamped temperature and humidity readings collected at regular time intervals by the rover’s onboard DHT22 sensor. The data highlights:
- Gradual fluctuations in environmental conditions.
- Notable temperature spikes (~10°C) introduced using a lighter to test sensor response.
- Stable humidity levels with minor deviations due to air circulation or sensor drift.
Notable Findings
- Controlled Temperature Spikes: Short bursts of heat resulted in clear temperature increases (~10°C), demonstrating the sensor's ability to detect and log transient changes.
- Humidity Stability: Humidity levels remained within a narrow range, confirming minimal impact from applied temperature fluctuations.
- Gradual Environmental Variations: Small temperature and humidity shifts were observed, likely due to ambient conditions and ventilation effects.
How the Data Was Gathered
- Sensor Used: DHT22 (for temperature & humidity).
- Data Collection Frequency: Logged every few seconds.
- Controlled Testing: Heat spikes added using a lighter to simulate external interference.
- Data Transmission: Logged in real-time via wireless communication to a laptop.
How to Interpret and Use the Data
- Identify Trends: Observe temperature and humidity variations over time.
- Detect Anomalies: Locate sharp temperature spikes (~10°C increases) caused by external heating.
- Compare Sensor Performance: Evaluate how quickly temperature normalizes after a spike.
- Develop Predictive Models: Train machine learning models to predict environmental changes.
Potential Applications
- Autonomous Environment Monitoring: Detecting and responding to environmental anomalies.
- Sensor Calibration & Validation: Testing DHT22 sensor accuracy under different conditions.
- Climate Simulation & Research: Indoor climate modeling & environmental trend analysis.
- Robotics & AI: Training AI for automated responses to climate fluctuations.
You can find related information on the GitHub repository of the project.
Facebook
TwitterClimate Watch is an online platform designed to empower policymakers, researchers, media and other stakeholders with the open climate data, visualizations and resources they need to gather insights on national and global progress on climate change.
Climate Watch is managed by World Resources Institute. It is a contribution to the NDC Partnership.
Encompassing data on Historical emissions by country, region, industry, and gas by year (1850-2018) Nationally Determined Contributions (NDCs); Linkages between Nationally Determined Contributions (NDCs) and the Sustainable Development Goals (SDGs) Emissions scenario pathways for major emitting countries, derived from a growing library of models;
The CDP Kaggle project prompted me to seek out datasets on climate change and I stumbled upon this one in my research. Climate watch is a great platform and their website offers many opportunities to explore and visualize the data.
I encourage everyone to check out their website: https://www.climatewatchdata.org/
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) is a global ocean marine meteorological and surface ocean dataset. It is formed by merging many national and international data sources that contain measurements and visual observations from ships (merchant, navy, research), moored and drifting buoys, coastal stations, and other marine and near-surface ocean platforms. Each marine report contains individual observations of meteorological and oceanographic variables, such as sea surface and air temperatures, wind, pressure, humidity, and cloudiness. The coverage is global and sampling density varies depending on date and geographic position relative to shipping routes and ocean observing systems.
The ICOADS dataset contains global marine data from ships (merchant, navy, research) and buoys, each capturing details according to the current weather or ocean conditions (wave height, sea temperature, wind speed, and so on). Each record contains the exact location of the observation which is great for visualizations. The historical depth of the data is quite comprehensive — There are records going back to 1662!
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
Dataset Source: NOAA Category: Meteorological, Climate, Transportation
Citation: National Centers for Environmental Information/NESDIS/NOAA/U.S. Department of Commerce, Research Data Archive/Computational and Information Systems Laboratory/National Center for Atmospheric Research/University Corporation for Atmospheric Research, Earth System Research Laboratory/NOAA/U.S. Department of Commerce, Cooperative Institute for Research in Environmental Sciences/University of Colorado, National Oceanography Centre/Natural Environment Research Council/United Kingdom, Met Office/Ministry of Defence/United Kingdom, Deutscher Wetterdienst (German Meteorological Service)/Germany, Department of Atmospheric Science/University of Washington, and Center for Ocean-Atmospheric Prediction Studies/Florida State University. 2016, updated monthly. International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3, Individual Observations. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory: https://doi.org/10.5065/D6ZS2TR3. Accessed 01 04 2017.
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Photo by Gleb Kozenko on Unsplash
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.
The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type
All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/
Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📊 About the Dataset
This dataset collection combines crop yields, soil composition, and weather data across Indian states, providing a comprehensive view of agriculture between 1997 and 2020. It includes:
Crop Yield Data: Crop-wise area, production, fertilizer/pesticide use, and yield trends.
Soil Data: State-level soil nutrients (N, P, K) and pH values.
Weather Data: Annual averages of temperature, rainfall, and humidity.
By integrating these datasets, users can explore how soil health, climatic conditions, and farm inputs interact to influence agricultural productivity.
🎯 Use Cases
Analyze crop yield trends over time and across states.
Study the impact of soil nutrients and pH on productivity.
Assess climate effects (rainfall, temperature, humidity) on crop yields.
Build machine learning models for yield prediction and crop recommendation.
Support climate-smart agriculture and policy planning.
This dataset is ideal for data science, machine learning, and academic projects focusing on agriculture, sustainability, and climate change.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview This dataset presents the findings of a research study conducted to explore the impact of comprehensive approaches on sustainable energy projects in the Ladakh region of Northern India. The study focuses on the integration of sustainable energy initiatives with climate change adaptation and poverty reduction. Data was collected from various sustainable energy projects implemented in the region, and key outcomes were analyzed to understand the effectiveness and viability of different technology transfer approaches and project management strategies.
Context Ladakh, known for its remote and challenging terrain, has been actively pursuing sustainable energy initiatives to combat climate change and uplift local communities out of poverty. This dataset delves into the diverse technology transfer approaches, financial support mechanisms, policy frameworks, community engagement, and primary barriers encountered during the implementation of sustainable energy projects.
Content The dataset comprises responses from 200 respondents, including project stakeholders, beneficiaries, government officials, and community members, who were involved in different sustainable energy initiatives. The data covers a wide range of aspects, including respondent demographics (age, gender, education level, and annual income), technology transfer approaches, financial support levels, policy frameworks, community engagement levels, primary barriers faced, the impact of barriers, partnership types, and the effectiveness of projects in achieving climate change adaptation and poverty reduction goals.
Methodology The research methodology involved data collection through surveys, interviews, and project documentation analysis. Data analysis techniques, including descriptive statistics, frequency distributions, and thematic analysis, were employed to derive insights from the dataset. The study aimed to highlight the significance of comprehensive approaches to technology transfer and project management in driving sustainable energy projects' success and long-term viability.
Potential Uses Researchers, policymakers, and sustainable energy enthusiasts can utilize this dataset to gain valuable insights into the factors influencing the effectiveness of sustainable energy projects in Ladakh. The dataset provides valuable information on the impact of different technology transfer approaches and project management practices on project outcomes and long-term viability. It can be used to identify successful strategies for climate change adaptation, poverty reduction, and community development through sustainable energy initiatives.
Acknowledgments: The dataset was collected and compiled by a team of researchers dedicated to promoting sustainable energy solutions and climate resilience in Ladakh. We extend our gratitude to all the project stakeholders, participants, and organizations involved in supporting this research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Project Climate Change, Health, and Artificial Intelligence (Project CCHAIN) dataset is a validated, open-sourced linked dataset containing 20 years (2003-2022) of climate, environmental, socioeconomic, and health dimensions at the barangay (village) level across twelve Philippine cities (Dagupan, Palayan, Navotas, Mandaluyong, Muntinlupa, Legazpi, Iloilo, Mandaue, Tacloban, Zamboanga, Cagayan de Oro, Davao). The full documentation can be accessed here.
The tables are designed in a way that users can choose variables that are most relevant to their focus city and use case, and link these variables to form a single dataset by merging using standard geography codes and calendar dates. This can be done using the provided linking notebook, or offline using the user's own code.
Here are some tips on how make most use of this dataset:
- Focus on one location. Starting with a detailed analysis of one location allows for a better understanding of the local dynamics, which may differ across locations.
- Choose one health data source. Pick one of either a central or local data source. Using two different data health sources is not advised because it will lead to double/overcounting of disease cases.
- Do not use all variables at once- do a literature review first to identify possible key variables to identify possible key variables. More often than not, using all variables is not necessary and may even yield subpar results.
- Decide whether or not to use regular or downscaled climate data. Our downscaled climate data provides nuanced insights on spatial patterns of a few climate variables. Kindly read the documentation before deciding to use this data. If you are uncertain, consider using only the climate_atmosphere table instead
- Check data availability on your focus location and make sure they fit the requirements of your study.
This dataset also includes household surveys tables (see schema here and here) done on partner informal settlement communities in the cities of Muntinlupa, Davao, Iloilo, and Mandaue and administered on various dates from 2001 to 2024. Due to the sensitive nature of surveys and the vulnerability of the subjects involved, requests for access must be submitted for review and approval by the Philippine Action for Community-Led Shelter Initiatives, Inc. (PACSII). To submit a request, please use this form.
The Project CCHAIN dataset adapted the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This allows anyone to share (copy and redistribute) and adapt (remix, transform, and build upon) a work, as long as they give appropriate credit to the original creator.
One exception, the tm_open_buildings table, follows the Open Database License (ODbL) as directed by its source, OpenStreetMap. Under the ODbL, users are free to use, modify, and distribute the database, but on top of CC BY 4.0's attribution requirement, this license requires to share any modifications they make under the same ODbL license.