Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a detailed exploration of global warming and climate change trends across 195 countries from 1900 to 2023. It includes 1,00,000 rows and 26 columns, capturing environmental, economic, and societal factors impacting global warming. Key indicators such as temperature anomalies, CO2 emissions, deforestation rates, sea-level rise, and renewable energy usage are included, making this dataset suitable for climate change prediction and analysis.
Whether you're a beginner exploring trends or an advanced data scientist building models, this dataset is an excellent resource for learning, experimentation, and insights into one of the most pressing challenges of our time.
Insights to Explore:
For Beginners:
Trend Analysis:
Track how global temperature anomalies have changed over the decades. Identify countries with the highest and lowest CO2 emissions. Explore population growth trends and their correlation with CO2 emissions.
Visualization Practice:
Create line charts showing changes in renewable energy usage over time. Develop bar charts comparing extreme weather events between countries.
For Intermediate Users:
Correlation Analysis:
Analyze relationships between deforestation rates and temperature anomalies. Explore how GDP and fossil fuel usage correlate with CO2 emissions. Feature Engineering:
Create new features like Per Capita CO2 Emissions or Energy Efficiency Score to enhance predictive modeling. Clustering:
Group countries based on their environmental policies and renewable energy usage.
For Advanced Users:
Predictive Modeling:
Build time-series models to forecast future temperature anomalies or sea-level rise. Develop machine learning models to predict CO2 emissions based on socioeconomic factors. Anomaly Detection:
Detect outliers in extreme weather events or CO2 emissions.
Deep Learning Applications:
Train deep learning models to predict Arctic ice extent using multi-year trends.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Temperature, Emissions & Environmental Trends (2000-2024)
This dataset provides a comprehensive overview of key environmental indicators collected over a span of 24 years (2000–2024) across multiple countries. It is designed to support analyses that explore the interplay between climate variables, human activities, and environmental changes. The dataset is particularly useful for researchers, data scientists, and policy analysts interested in climate change, sustainability, and environmental impact studies.
The dataset is curated with high-quality environmental metrics and is referenced from OpenML, ensuring a robust foundation for academic research and policy analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The monthly mean temperature data presented in this dataset was obtained from the Climate Prediction Center (CPC) Global Land Surface Air Temperature Analysis, which was loaded into Python using xarray. The data was then filtered to include only the latitude and longitude coordinates corresponding to each city in the dataset. In order to select the nearest location to each city, the 'select' method with the nearest point was used, resulting in temperature data that may not be exactly at the city location. The data is presented on a 0.5x0.5 degree grid across the globe.
The temperature data provides a valuable resource for time series analysis, and if you are interested in obtaining temperature data for additional cities, please let me know. I will also be sharing the source code on GitHub for anyone who would like to reproduce the data or analysis.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This file is based on the new high-resolution Berkeley Earth global temperature data set. It expands upon the previous Berkeley Earth temperature data set by including predictive structures based on historical weather patterns and increasing the underlying resolution to 0.25° x 0.25° latitude-longitude.
Files based on this new data set are being provided as part of an early preview to aid in the identification of any remaining bugs or errors. While, we believe the current data set to be accurate and useful, it is still in development and substantial revisions remain possible if significant issues are identified.
This file contains a detailed summary of the changes in Earth's global average surface temperature estimated by combining the new high-resolution Berkeley Earth land-surface temperature field with a reinterpolated version of the HadSST4 ocean temperature field.
As a preliminary data product, no citation for this work currently exists.
This global data product merges land-surface air temperatures with ocean sea surface water temperatures. For most of the ocean, sea surface temperatures are similar to near-surface air temperatures; however, air temperatures above sea ice can differ substantially from the water below the sea ice. In sea ice regions, temperature anomalies are extrapolated from the land-surface air temperatures when ice is present, and from the ocean temperatures when ice is absent.
The percent coverage of sea ice was taken from the HadISST v2 dataset and varies by month and location. In the typical month, between 3.5% and 5.5% of the Earth's surface is covered with sea ice. For more information on the processing and use of HadISST and HadSST refer to the description file for the combined gridded data product.
Temperature data contributing to this analysis include (but are not limited to):
GHCN-Monhtly v4, Menne et al. 2018, https://doi.org/10.1175/JCLI-D-18-0094.1
Global Summary of the Day, https://www.ncei.noaa.gov/products/global-summary-day MET-Reader, Scientific Committee for Antaractic Research, British Antarctic Survey, https://legacy.bas.ac.uk/met/READER/ HADSST4, Kennedy et al. 2019, https://doi.org/10.1029/2018JD029867
Ice mask data comes from:
HadISST2, Titcher and Rayner 2014, https://doi.org/10.1002/2013JD020316 Sea Ice Index, NSIDC, https://nsidc.org/data/g02135/versions/3
High-resolution downscaling algorithms were trained using high-resolution data, though none of this data is used directly in the reconstruction. High-resolution datasets used in training include:
ERA5 from the Copernicus Climate Change Service, Hersbach et al. (2018), http://doi.org/10.24381/cds.adbb2d47 https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels
The above list of data sources is only a partial list. For a more complete set of references please refer to Berkeley Earth's previous description papers.
Temperatures are in Celsius and reported as anomalies relative to the Jan 1951-Dec 1980 average. Uncertainties represent the 95% confidence interval for statistical and spatial undersampling effects as well as ocean biases.
The land analysis was run on 06-Mar-2023 02:09:12 The ocean analysis was published on 13-Mar-2023 02:52:51
The land component is based on 50498 time series with 21081445 monthly data points
The ocean component is based on 456950592 instantaneous water temperature observations
Estimated Jan 1951-Dec 1980 global mean temperature (°C): 14.148 +/- 0.019
As Earth's land is not distributed symmetrically about the equator, there exists a mean seasonality to the global average temperature.
Estimated Jan 1951-Dec 1980 monthly absolute temperature: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 12.31 12.52 13.15 14.07 15.01 15.68 15.91 15.72 15.17 14.30 13.33 12.59 +/- 0.03 0.03 0.03 0.03 0.03 0.03 0.02 0.02 0.03 0.02 0.03 0.03
For each month, we report the estimated global surface temperature anomaly for that month and its uncertainty. We also report the corresponding values for 12-month, five-year, ten-year, and twenty-year moving averages CENTERED about that month (rounding down if the center is in between months). For example, the annual average from January to December 1950 is reported at June 1950.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Who among us doesn't talk a little about the weather now and then? Will it rain tomorrow and get so cold to shake your chin or will it make that cracking sun? Does global warming exist?
With this dataset, you can apply machine learning tools to predict the average temperature of Detroit city based on historical data collected over 5 years.
The given data set was produced from the Historical Hourly Weather Data [https://www.kaggle.com/selfishgene/historical-hourly-weather-data], which consists of about 5 years of hourly measurements of various weather attributes (eg. temperature, humidity, air pressure) from 30 US and Canadian cities.
From this rich database, a cutout was made by selecting only the city of Detroit (USA), highlighting only the temperature, converting it to Celsius degrees and keeping only one value for each date (corresponding to the average daytime temperature - from 9am to 5pm).
In addition, temperature values were artificially and gradually increased by a few Celsius degrees over the available period. This will simulate a small global warming (or is it local?)...
In summary, the available dataset contains the average daily temperatures (collected during the day), artificially increased by a certain value, for the city of Detroit from October 2012 to November 2017.
The purpose of this dataset is to apply forecasting models in order to predict the value of the artificially warmed average daily temperature of Detroit.
See graph in the following image: black dots refer to the actual data and the blue line represents the predictive model (including a confidence area).
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3089313%2Faf9614514242dfb6164a08c013bf6e35%2Fplot-ts2.png?generation=1567827710930876&alt=media" alt="">
This dataset wouldn't be possible without the previous work in Historical Hourly Weather Data.
What are the best forecasting models to address this particular problem? TBATS, ARIMA, Prophet? You tell me!
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Dataset Title: Global Climate Change Indicators: A Comprehensive Dataset (2000-2024)
Subtitle: Tracking Temperature, Emissions, Sea Level Rise, and Environmental Trends Across Countries
Description: This dataset provides a comprehensive overview of key climate change indicators collected across different countries from the year 2000 to 2024. It includes 1000 data points capturing various environmental and socio-economic factors that reflect the global impact of climate change. The dataset focuses on average temperature, CO2 emissions, sea-level rise, rainfall patterns, and more, enabling users to analyze trends, correlations, and anomalies.
Fields Explanation: Year: The year in which the data was recorded, ranging from 2000 to 2024. It helps track historical trends in climate change and related variables over time.
Country: The country or region where the climate data was collected. The dataset includes a diverse set of countries from across the globe, representing different geographic regions and climates.
Average Temperature (°C): The average annual temperature recorded in each country, measured in degrees Celsius. This field allows for comparisons of temperature changes across regions and time.
CO2 Emissions (Metric Tons per Capita): The average amount of CO2 emissions per capita in metric tons, reflecting the country's contribution to greenhouse gases. This field is useful for analyzing the link between human activity and environmental changes.
Sea Level Rise (mm): The recorded annual sea-level rise in millimeters for coastal regions. This indicator reflects the global warming effect on melting glaciers and thermal expansion of seawater, critical for studying impacts on coastal populations.
Rainfall (mm): The total annual rainfall recorded in millimeters. This field highlights changing precipitation patterns, essential for understanding droughts, floods, and water resource management.
Population: The population of the country in the given year. Population data is important to normalize emissions or other per-capita analyses and understand human impact on the environment.
Renewable Energy (%): The percentage of total energy consumption in a country that comes from renewable energy sources (solar, wind, hydro, etc.). This metric is vital for assessing the progress made toward sustainable energy and reducing reliance on fossil fuels.
Extreme Weather Events: The number of extreme weather events recorded in each country, such as hurricanes, floods, wildfires, and droughts. Tracking these events helps correlate the increase in climate change with the frequency of natural disasters.
Forest Area (%): The percentage of the total land area of a country covered by forests. Forest cover is a critical indicator of biodiversity and carbon sequestration, with reductions often linked to deforestation and habitat loss.
Applications: Climate Research: This dataset is invaluable for researchers and analysts studying global climate change trends. By focusing on multiple indicators, users can assess the relationships between temperature changes, emissions, deforestation, and extreme weather patterns.
Environmental Policy Making: Governments and policy analysts can use this dataset to develop more effective climate policies based on historical and regional data. For example, countries can use emissions data to set realistic reduction goals in line with international agreements.
Renewable Energy Studies: Renewable energy data provides insights into how different regions are transitioning toward greener energy sources, offering a comparison between high-emission and low-emission countries.
Predictive Modeling: The data can be used for machine learning models to predict future climate scenarios, especially in relation to global temperature rise, sea-level changes, and extreme weather events.
Public Awareness & Education: This dataset is a useful educational tool for raising awareness about the impacts of climate change. Students and the general public can use it to explore real-world data and learn about the importance of sustainable development.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Overview: This dataset offers a comprehensive collection of Daily weather readings from major cities around the world. In the first release, it included only capitals, but now it also adds main cities worldwide and hourly data as well, making up to ~1250 cities. Some locations provide historical data tracing back to January 2, 1833, giving users a deep dive into long-term weather patterns and their evolution.
Data License and Updates: This dataset is updated every Sunday using data from Meteostat API, ensuring access to the latest week's data without overburdening the data source.
cities.csv)This dataframe offers details about individual cities and weather stations.
- Columns:
- station_id: Unique ID for the weather station.
- city_name: Name of the city.
- country: The country where the city is located.
- state: The state or province within the country.
- iso2: The two-letter country code.
- iso3: The three-letter country code.
- latitude: Latitude coordinate of the city.
- longitude: Longitude coordinate of the city.
countires.csv)This dataframe contains information about different countries, providing insights into their geographic and demographic characteristics.
- Columns:
- iso3: The three-letter code representing the country.
- country: The English name of the country.
- native_name: The native name of the country.
- iso2: The two-letter code representing the country.
- population: The population of the country.
- area: The total land area of the country in square kilometers.
- capital: The name of the capital city.
- capital_lat: The latitude coordinate of the capital city.
- capital_lng: The longitude coordinate of the capital city.
- region: The specific region within the continent where the country is located.
- continent: The continent to which the country belongs.
- hemisphere: The hemisphere in which the country is located (e.g., Northern, Southern).
daily_weather.parquet)This dataframe provides weather data on a daily basis.
- Columns:
- station_id: Unique ID for the weather station.
- city_name: Name of the city where the station is located.
- date: Date of the weather record.
- season: Season corresponding to the date (e.g., summer, winter).
- avg_temp_c: Average temperature in Celsius.
- min_temp_c: Minimum temperature in Celsius.
- max_temp_c: Maximum temperature in Celsius.
- precipitation_mm: Precipitation in millimeters.
- snow_depth_mm: Snow depth in millimeters.
- avg_wind_dir_deg: Average wind direction in degrees.
- avg_wind_speed_kmh: Average wind speed in kilometers per hour.
- peak_wind_gust_kmh: Peak wind gust in kilometers per hour.
- avg_sea_level_pres_hpa: Average sea-level pressure in hectopascals.
- sunshine_total_min: Total sunshine duration in minutes.
These dataframes can be utilized for various analyses such as weather trend prediction, climate studies, geographic analysis, demographic insights, and more.
Dataset Image Source: Photo credits to 越过山丘. View the original image here.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
As climate change accelerates, the world's oceans are experiencing significant transformations. This dataset compiles synthetic-yet-realistic measurements of sea surface temperature (SST), pH levels, coral bleaching severity, and species observations from ecologically critical marine zones. It spans from 2015 to 2023 and simulates how marine environments are responding to global warming, acidification, and heatwaves.
The goal of this dataset is to support machine learning, climate analysis, and ecological modeling focused on: - Predicting coral bleaching events - Monitoring marine heatwaves - Understanding species biodiversity shifts - Correlating temperature changes with ocean acidification
Facebook
TwitterGlobal warming is the ongoing rise of the average temperature of the Earth's climate system and has been demonstrated by direct temperature measurements and by measurements of various effects of the warming - Wikipedia
So a dataset on the temperature of major cities of the world will help analyze the same. Also weather information is helpful for a lot of data science tasks like sales forecasting, logistics etc.
Thanks to University of Dayton, the dataset is available as separate txt files for each city here. The data is available for research and non-commercial purposes only.. Please refer to this page for license.
Daily level average temperature values is present in city_temperature.csv file
University of Dayton for making this dataset available in the first place!
Photo credits: James Day on Unsplash
Some ideas are: 1. How is the average temperature of the world changing over time? 2. Is the temperature information helpful for other forecasting tasks?
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Climate change has a profound impact on global agriculture, affecting crop yields, soil health, and farming sustainability. This synthetic dataset is designed to simulate real-world agricultural data, enabling researchers, data scientists, and policymakers to explore how climate variations influence food production across different regions.
🔍 Key Features: ✔️ Climate Variables – Simulated data on temperature changes, precipitation levels, and extreme weather events ✔️ Crop Productivity – Modeled impact of climate shifts on yields of key crops like wheat, rice, and corn ✔️ Regional Insights – Includes various geographic regions to analyze diverse climate-agriculture interactions ✔️ Ideal for Predictive Modeling – Supports climate risk assessment, food security studies, and sustainability research
📊 Dataset Overview: This dataset has been synthetically generated and does not contain real-world agricultural records. It is intended for academic learning, climate impact analysis, and machine learning applications in environmental studies.
📖 Columns Description: Region – Simulated geographic region Year – Modeled year of data collection Average_Temperature – Simulated temperature levels (°C) Precipitation – Modeled annual rainfall (mm) Crop_Yield – Synthetic yield data for selected crops (tons/hectare) Extreme_Weather_Events – Number of modeled extreme weather occurrences per year ⚠️ Disclaimer: This dataset is completely synthetic and should not be used for real-world climate policy decisions or agricultural forecasting. It is meant for educational purposes, research, and data science applications.
🔹 Use this dataset to analyze climate trends, build predictive models, and explore solutions for sustainable agriculture! 🌱📊
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed daily climate information for over 600+ evenly distributed coordinates across Sri Lanka from November 11, 2022, to November 11, 2024. The data was collected using the Open-Meteo API and includes measurements of maximum and minimum temperatures, total daily precipitation, and other variables.
Context and Inspiration: Understanding climate patterns is essential for agriculture, urban planning, and disaster preparedness. This dataset can aid researchers, data scientists, and policymakers in studying trends and impacts of climate change in Sri Lanka.
Usage Ideas: - Identify Climate Zones - Predict seasonal trends in temperature and rainfall. - Analyze relationships between geographical location and climate. - Assist in agricultural forecasting models or climate risk assessments.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical records of land surface temperatures collected from various global regions. The data can be used to analyze climate trends, monitor temperature variations over time, and support research in climate change, environmental science, and data analytics.
It includes date-wise temperature readings along with geographical references such as location names or coordinates (if present in your CSV). The dataset is suitable for time series analysis, trend forecasting, and building machine learning models for climate prediction.
Key Features: 🌡️ Land surface temperature measurements 📅 Time-series data (monthly/yearly format) 🧪 Ideal for climate pattern analysis, forecasting, and environmental studies
Potential Use Cases: - Climate change research - Seasonal temperature trend analysis - Data visualization dashboards - Machine learning models for temperature prediction - Academic or environmental research projects
Tools Recommended: Excel, Python (Pandas, Matplotlib, Seaborn), Power BI, Tableau
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains information about heatwave anomalies, which are periods of unusually hot weather that deviate significantly from historical averages. It could encompass various aspects of heatwaves, potentially including:
Data Purpose:
This dataset could be valuable for several purposes:
Heatwave Monitoring and Prediction: Analysing historical data to identify patterns and develop models for predicting future heatwave events. Climate Change Impact Assessment: Studying how heatwave frequency, intensity, and duration are changing over time, potentially linked to climate change. Risk Assessment and Mitigation: Identifying areas at high risk of heatwaves and developing strategies to mitigate their impact on infrastructure, public health, and agriculture. Research on Heatwave Dynamics: Understanding the factors that contribute to heatwave formation, persistence, and movement. Data Analysis Potential:
The dataset can be used for various data analysis techniques, including:
Descriptive Statistics: Summarizing key features of heatwaves (average duration, temperature increase, etc.). Exploratory Data Analysis (EDA): Visualizing trends, identifying outliers, and exploring relationships between variables. Spatial Analysis: Mapping heatwave occurrences and intensity across geographical regions. Time Series Analysis: Studying how heatwave characteristics change over time. Machine Learning and Statistical Modelling: Developing models to predict heatwaves, assess risks, or classify their severity.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This document provides a detailed summary of the country_weather_data.csv dataset, which contains daily weather observations from different countries spanning over two decades. The dataset is ideal for climate analytics, environmental modeling, and time series forecasting.
Country: Country name Date: Date of observation (DD-MM-YYYY)Temp_Max: Maximum temperature (°C)Temp_Min: Minimum temperature (°C)Temp_Mean: Mean temperature (°C)Precipitation_Sum: Total daily precipitation (mm)Windspeed_Max: Maximum wind speed (km/h)Windgusts_Max: Maximum wind gusts (km/h)Sunshine_Duration: Total sunshine duration (seconds)Country, DateTemp_Max, Temp_Min, Temp_Mean, Precipitation_Sum, Windspeed_Max, Windgusts_Max, Sunshine_Duration
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In this dataset, Climate changes of all countries are explained from the Year 1961 to 2022. The purpose of this dataset is to provide information about Climatic conditions worldwide and the factors that cause changes in Climate over a long span of time. By exploring this dataset, we can easily get to know which factors play an important role in the change of climate all over the world, and whether they have a positive or a negative impact on climate change
Goal: Predict future temperature change (e.g., for 2030 or 2050). Techniques:
Goal: Detect years or countries with abnormal temperature spikes. Techniques:
Goal: Categorize countries into climate impact levels (e.g., low, medium, high warming). Techniques:
Goal: Find which decades or years contribute most to long-term warming. Techniques:
Goal: Combine this dataset with CO₂ emissions, deforestation, or industrial data to predict temperature change. Techniques:
Goal: Build a dynamic dashboard showing warming trends per country. Tech stack: Python (Plotly, Dash, or Streamlit) Use: For interactive exploration by students, researchers, or policymakers.
Facebook
TwitterThe dataset was created by keeping in mind the necessity of such historical weather data in the community. The datasets for top 8 Indian cities as per the population.
The dataset was used with the help of the worldweatheronline.com API and the wwo_hist package. The datasets contain hourly weather data from 01-01-2009 to 01-01-2020. The data of each city is for more than 10 years. This data can be used to visualize the change in data due to global warming or can be used to predict the weather for upcoming days, weeks, months, seasons, etc. Note : The data was extracted with the help of worldweatheronline.com API and I can't guarantee about the accuracy of the data.
The data is owned by worldweatheronline.com and is extracted with the help of their API.
The main target of this dataset can be used to predict weather for the next day or week with huge amounts of data provided in the dataset. Furthermore, this data can also be used to make visualization which would help to understand the impact of global warming over the various aspects of the weather like precipitation, humidity, temperature, etc.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Reliable forecasting of air temperature at 2 m above the land surface plays a significant role when preparing for potential weather-related disasters, such as heat waves (i.e., maximum daytime air temperature) and cold spells (i.e., minimum nighttime air temperature).
In particular, the increasing intensity, frequency and duration of extreme air temperatures during the summer season (Perkins et al., 2012), and the fact that more than half of the Earth's population now lives in cities (Schulze & Langenberg, 2014) suggest that accurate air temperature forecasting is essential for urban areas.
Forecasts of maximum and minimum air temperatures are essential to mitigate the damage of extreme weather events such as heat waves and tropical nights. The Numerical Weather Prediction (NWP) model has been widely used for forecasting air temperature, but generally it has a systematic bias due to its coarse grid resolution and lack of parametrizations.
Your Task is to devise a Machine Learning Model that helps us to predict the Extreme-Weather Temperature prediction on the basis of the features provided.
Dataset Description Present_Tmax - Maximum air temperature between 0 and 21 h on the present day (°C): 20 to 37.6 Present_Tmin - Minimum air temperature between 0 and 21 h on the present day (°C): 11.3 to 29.9 LDAPS_RHmin - LDAPS model forecast of next-day minimum relative humidity (%): 19.8 to 98.5 LDAPS_RHmax - LDAPS model forecast of next-day maximum relative humidity (%): 58.9 to 100 LDAPS_Tmax_lapse - LDAPS model forecast of next-day maximum air temperature applied lapse rate (°C): 17.6 to 38.5 LDAPS_Tmin_lapse - LDAPS model forecast of next-day minimum air temperature applied lapse rate (°C): 14.3 to 29.6 LDAPS_WS - LDAPS model forecast of next-day average wind speed (m/s): 2.9 to 21.9 LDAPS_LH - LDAPS model forecast of next-day average latent heat flux (W/m2): -13.6 to 213.4 LDAPS_CC1 - LDAPS model forecast of next-day 1st 6-hour split average cloud cover (0-5 h) (%): 0 to 0.97 LDAPS_CC2 - LDAPS model forecast of next-day 2nd 6-hour split average cloud cover (6-11 h) (%): 0 to 0.97 LDAPS_CC3 - LDAPS model forecast of next-day 3rd 6-hour split average cloud cover (12-17 h) (%): 0 to 0.98 LDAPS_CC4 - LDAPS model forecast of next-day 4th 6-hour split average cloud cover (18-23 h) (%): 0 to 0.97 LDAPS_PPT1 - LDAPS model forecast of next-day 1st 6-hour split average precipitation (0-5 h) (%): 0 to 23.7 LDAPS_PPT2 - LDAPS model forecast of next-day 2nd 6-hour split average precipitation (6-11 h) (%): 0 to 21.6 LDAPS_PPT3 - LDAPS model forecast of next-day 3rd 6-hour split average precipitation (12-17 h) (%): 0 to 15.8 LDAPS_PPT4 - LDAPS model forecast of next-day 4th 6-hour split average precipitation (18-23 h) (%): 0 to 16.7 lat - Latitude (°): 37.456 to 37.645 lon - Longitude (°): 126.826 to 127.135 DEM - Elevation (m): 12.4 to 212.3 Slope - Slope (°): 0.1 to 5.2 Solar radiation - Daily incoming solar radiation (wh/m2): 4329.5 to 5992.9 Next_Tmax - The next-day maximum air temperature (°C): 17.4 to 38.9
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains Global Sea Surface Temperature (SST) data from NASA's Group for High Resolution Sea Surface Temperature (GHRSST) product. SST data exhibits complex, multiscale features of turbulent flows with intermittent events and quasi-periodic behavior, making it a challenging benchmark for forecasting, reconstruction, and prediction tasks. Unlike the synthetic KS and Lorenz datasets, this represents real-world geophysical observations, providing a critical testbed for evaluating data-driven methods on actual scientific data.
The dataset is derived from the Naval Oceanographic Office (NAVO) GHRSST Level 4 K10_SST version 1.0 product, which provides:
The objective of GHRSST is to provide the best quality SST data for applications across short, medium, and decadal/climate time scales through international collaboration and scientific innovation. All GHRSST data products are publicly available through NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC).
For this CTF benchmark, undisclosed spatial-temporal patches have been extracted from the public GHRSST data:
Note: The spatial dimension listed in the YAML (90,601) represents a flattened spatial grid that includes additional processing or masking of the 200×200 grid.
This dataset is part of the Common Task Framework (CTF) for Science, designed to provide standardized, rigorous benchmarks for evaluating machine learning algorithms on real-world scientific problems. The SST dataset addresses key challenges including:
import numpy as np
# Load training data
X1_train = np.load('SST/npy/train/X1train.npy')
print(f"Shape: {X1_train.shape}") # (800, 90601)
print(f"Time steps: {X1_train.shape[0]}")
print(f"Spatial points: {X1_train.shape[1]}")
print(f"Temperature range: [{X1_train.min():.2f}, {X1_train.max():.2f}]")
# Memory-mapped loading for large files (doesn't load full array into RAM)
X1_train_mmap = np.load('SST/npy/train/X1train.npy', mmap_mode='r')
# Note: Test data files are not included in the public dataset
# Generate your predictions and submit to the CTF4Science platform
import numpy as np
import pandas as pd
# Load training data from CSV
X1_train = np.loadtxt('SST/csv/train/X1train.csv', delimiter=',')
print(f"Shape: {X1_train.shape}") # (800, 90601)
# Load timesteps
timesteps = np.loadtxt('SST/csv/train/X1train_timesteps.csv')
print(f"Time range: [{timesteps[0]:.1f}, {timesteps[-1]:.1f}]")
# Load with pandas for easier handling
df = pd.read_csv('SST/csv/train/X1train.csv', header=None)
...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The context of this dataset is that of the COVID-19 forecasting competition. By enriching the original dataset with metereological features I hope to help other researchers in their job.
In this dataset I have added weather informations, such as temperature and precipitations, to the training set of the COVID-19 forecasting competition, in order to determine whether there is any correlation with the growth of confirmed cases. Weather data is imported from the NOAA GSOD dataset, continuously updated to include recent measurments.
My gratitude goes to all who worked to both the NOAA GSOD dataset and the COVID-19 forecasting competition.
This dataset was built in order to investigate possible relationships between the spread and resistance of COVID-19, and climatical features such as temperature and humidity.
Facebook
TwitterDataset about soil moisture , temperature and nutrition use for crop prediction based on these factors.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a detailed exploration of global warming and climate change trends across 195 countries from 1900 to 2023. It includes 1,00,000 rows and 26 columns, capturing environmental, economic, and societal factors impacting global warming. Key indicators such as temperature anomalies, CO2 emissions, deforestation rates, sea-level rise, and renewable energy usage are included, making this dataset suitable for climate change prediction and analysis.
Whether you're a beginner exploring trends or an advanced data scientist building models, this dataset is an excellent resource for learning, experimentation, and insights into one of the most pressing challenges of our time.
Insights to Explore:
For Beginners:
Trend Analysis:
Track how global temperature anomalies have changed over the decades. Identify countries with the highest and lowest CO2 emissions. Explore population growth trends and their correlation with CO2 emissions.
Visualization Practice:
Create line charts showing changes in renewable energy usage over time. Develop bar charts comparing extreme weather events between countries.
For Intermediate Users:
Correlation Analysis:
Analyze relationships between deforestation rates and temperature anomalies. Explore how GDP and fossil fuel usage correlate with CO2 emissions. Feature Engineering:
Create new features like Per Capita CO2 Emissions or Energy Efficiency Score to enhance predictive modeling. Clustering:
Group countries based on their environmental policies and renewable energy usage.
For Advanced Users:
Predictive Modeling:
Build time-series models to forecast future temperature anomalies or sea-level rise. Develop machine learning models to predict CO2 emissions based on socioeconomic factors. Anomaly Detection:
Detect outliers in extreme weather events or CO2 emissions.
Deep Learning Applications:
Train deep learning models to predict Arctic ice extent using multi-year trends.