Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches) Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex Integrated Surface Hourly (ISH). Historical data are generally available for 1929 to the present, with data from 1973 to the present being the most complete. For some periods, one or more countries' data may not be available due to data restrictions or communications problems. In deriving the summary of day data, a minimum of 4 observations for the day must be present (allows for stations which report 4 synoptic observations/day). Since the data are converted to constant units (e.g, knots), slight rounding error from the originally reported values may occur (e.g, 9.9 instead of 10.0). The mean daily values described below are based on the hours of operation for the station. For some stations/countries, the visibility will sometimes 'cluster' around a value (such as 10 miles) due to the practice of not reporting visibilities greater than certain distances. The daily extremes and totals--maximum wind gust, precipitation amount, and snow depth--will only appear if the station reports the data sufficiently to provide a valid value. Therefore, these three elements will appear less frequently than other values. Also, these elements are derived from the stations' reports during the day, and may comprise a 24-hour period which includes a portion of the previous day. The data are reported and summarized based on Greenwich Mean Time (GMT, 0000Z - 2359Z) since the original synoptic/hourly data are reported and based on GMT.
There are nearly 2,200 interagency Remote Automatic Weather Stations (RAWS) strategically located throughout the United States. RAWS are self-contained, portable, and permanent, solar powered weather stations that provide timely local weather data used primarily in fire management. These stations monitor the weather and provide weather data that assists land management agencies with a variety of projects such as monitoring air quality, rating fire danger, and providing information for research applications.
Most of the stations owned by the wildland fire agencies are placed in locations where they can monitor fire danger. RAWS units collect, store, and forward data to a computer system at the National Interagency Fire Center (NIFC) in Boise, Idaho, via the Geostationary Operational Environmental Satellite (GOES). The GOES is operated by the National Oceanic and Atmospheric Administration (NOAA). The data is automatically forwarded to several other computer systems including the Weather Information Management System (WIMS) and the Western Regional Climate Center (WRCC) in Reno, Nevada.
Fire managers use this data to predict fire behavior and monitor fuels; resource managers use the data to monitor environmental conditions. Locations of RAWS stations can be searched online courtesy of the Western Regional Climate Center.
Facts about RAWS:
Weather Data collected by CIMIS automatic weather stations. The data is available in CSV format. Station data include measured parameters such as solar radiation, air temperature, soil temperature, relative humidity, precipitation, wind speed and wind direction as well as derived parameters such as vapor pressure, dew point temperature, and grass reference evapotranspiration (ETo).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Danish Khan
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Climate change, that is a threat to ecosystems and the livelihoods of those that depend on them, is increasingly manifesting as an increased frequency and intensity of severe weather events such as droughts and floods (Déqué et al., 2017). Climate change has created an urgent need for early warning aids or models to enhance the sub-Saharan African health systems ability to prepare for, and cope with escalations in treatment needs of climate sensitive diseases (Nhamo & Muchuru, 2019). This dataset was created from the health and weather data of nine purposively selected study districts in Uganda, whose health and weather data were available for the development of an early warning health model (https://github.com/CHAIUGA/chasa-model) and an accompanying prediction web app (https://github.com/CHAIUGA/chasa-webapp). The districts were selected based on the following criteria: (a) were experiencing climate change and variability, (b) represented different climatologic, and agro-ecological zones, (c) availability of climate information and health information from a health facility within a 40 kilometres radius of a functional weather station. Historical weather data was retrieved from the Uganda National Meteorological Association databases, as monthly averages. The weather variables in this data included: atmospheric pressure, rainfall, solar radiation, humidity, temperature (maximum, minimum and mean), and wind (gusts and average wind speed). The monthly health aggregated data for the period starting September 2018 to December 2019, was retrieved from the National Health Repository (DHIS2) for referral hospitals within the selected districts. Only data for a selection of climate-sensitive disease aggregates was obtained. The dataset contains 436 complete matched disease and weather records. Ethical issues: Both the de-identified aggregate monthly disease diagnosis count data and weather data in this dataset are from national data available to the public on request.
The Netatmo V1 dataset contains observations from all Public Weather Stations (PWS) contributing to the Netatmo database within Europe. Netatmo is a company that designs and manufactures a range of smart weather station instruments for the home. The dataset is for a single year (2020), made available for use within the EUMETNET Sandbox project. EUMETNET (a grouping of 31 European National Meteorological Services) instigated the EUMETNET Sandbox project to bring novel observations and observations from technology trials and field campaigns to the research community to enable R&D activities. The data are not quality controlled and are presented in the format provided by Netatmo. The data are provided in a single file per month per country*. The data were extracted from the Netatmo database country by country. The meteorological values are unchanged from those extracted from the Netatmo archive. For example, there is no Quality Control of the data, no calibration of the instruments and no unit conversions have been applied. The data were extracted from the Netatmo database by Netatmo operators of the Netatmo system. The data have not been manipulated to meet any international data format standards. For each station there is always a metadata file 'n'.metadata.json. There are up to 4 data files associated with each station represented by a metadata file. In some cases, all 4 data files are present for the station. In other cases, only one data file is present. The 'n' in the file name allows the metadata file to be associated with the meteorological data files 1. n.pressure.historic.csv - surface pressure for station n 2. n.outdoor.historic.csv - Contains air temperature and humidity for station n 3. n.wind.historic.csv - Contains wind and gust data for station n 4. n.rain.historic.csv - rainfall data for station n The data files are semi-colon separated and use UNIX epoch time *Countries present in the Netatmo dataset Austria, Spain, Iceland, Norway, Belgium, Finland, Italy, Poland, Switzerland, France, Luxembourg, Portugal, Cyprus, United Kingdom, Latvia, Serbia, Czech Republic, Greece, Montenegro, Sweden, Germany, Croatia, North Macedonia, Slovenia, Denmark, Hungary, Malta, Slovakia, Estonia, Ireland and the Netherlands
Saudi Arabia hourly climate integrated surface data with the below data observations, WindSky conditionVisibilityAir temperatureDewSea level pressureNote: The dataset will contain the last 5 years hourly data, however, check the attachments section in this dataset if you need historical data.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides weekly average temperature data for all U.S. counties from 2013 to 2023. Each row in the dataset represents a specific county, and the columns correspond to the weekly average temperatures over the ten-year period. The dataset is structured to facilitate time series analysis, climate trend studies, and machine learning applications related to environmental and climate change research.
Key Features: - County-Level Data: Temperature data is provided for each county in the United States, allowing for detailed, localized climate analysis. - Weekly Time Intervals: The data is aggregated on a weekly basis, offering a finer temporal resolution that captures seasonal and short-term temperature fluctuations.
10-Year Span: Covers a significant period from 2013 to 2023, enabling long-term trend analysis and comparison across different periods.
Temperature Units: All temperature values are presented in Kelvin (K).
Potential Uses:
Climate Research: Investigate climate change impacts at the county level, identify trends, and assess regional climate variability. Geospatial Analysis: Integrate with other spatial datasets for comprehensive environmental and geographical studies.
Machine Learning: Suitable for training models on temporal climate data, predictive analytics, and anomaly detection.
Public Policy and Planning: Useful for policymakers to study historical climate trends and support decision-making in areas such as agriculture, disaster management, and urban planning.
This dataset is ideal for researchers, data scientists, and analysts interested in exploring U.S. climate data at a granular level.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for the outdoor modules of the Netatmo personal weather stations that have selected to share their data publicly, which were obtained by querying the Netatmo public API every 15 minutes. One CSV (comma-separated values) file per queried time. There is one weather station per line. The attributes available are: ID (MAC address of station), longitude, latitude, altitude, temperature, humidity and pressure. For those weather stations that have a rain gauge, there is also rainfall data available.
Hourly Precipitation Data (HPD) is digital data set DSI-3240, archived at the National Climatic Data Center (NCDC). The primary source of data for this file is approximately 5,500 US National Weather Service (NWS), Federal Aviation Administration (FAA), and cooperative observer stations in the United States of America, Puerto Rico, the US Virgin Islands, and various Pacific Islands. The earliest data dates vary considerably by state and region: Maine, Pennsylvania, and Texas have data since 1900. The western Pacific region that includes Guam, American Samoa, Marshall Islands, Micronesia, and Palau have data since 1978. Other states and regions have earliest dates between those extremes. The latest data in all states and regions is from the present day. The major parameter in DSI-3240 is precipitation amounts, which are measurements of hourly or daily precipitation accumulation. Accumulation was for longer periods of time if for any reason the rain gauge was out of service or no observer was present. DSI 3240_01 contains data grouped by state; DSI 3240_02 contains data grouped by year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LTAR network maintains stations for standard meteorological measurements including, generally, air temperature and humidity, shortwave (solar) irradiance, longwave (thermal) radiation, wind speed and direction, barometric pressure, and precipitation. Many sites also have extensive comparable legacy datasets. The LTAR scientific community decided that these needed to be made available to the public using a single web source in a consistent manner. To that purpose, each site sent data on a regular schedule, as frequently as hourly, to the National Agricultural Library, which has developed a web service to provide the data to the public in tabular or graphical form. This archive of the LTAR legacy database exports contains meteorological data through April 30, 2021. For current meteorological data, visit the GeoEvent Meteorology Resources page, which provides tools and dashboards to view and access data from the 18 LTAR sites across the United States. Resources in this dataset:Resource Title: Meteorological data. File Name: ltar_archive_DB.zipResource Description: This is an export of the meteorological data collected by LTAR sites and ingested by the NAL LTAR application. This export consists of an SQL schema definition file for creating database tables and the data itself. The data is provided in two formats: SQL insert statements (.sql) and CSV files (.csv). Please use the format most convenient for you. Note that the SQL insert statements take much longer to run since each row is an individual insert. Description of zip files The ltararchive*.zip files contain database exports. The schema is a .sql file; the data is exported as both SQL inserts and CSV for convenience. There is a README in markdown and PDF in the zips. Contains the database export of the schema and data for the site, site_station, and met tables as SQL insert statements. ltar_archive_db_sql_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_sql_export_20210430.zip --> has data until 2021-04-30 Contains the database export of the schema and data for the site, site_station, and met tables as CSV. ltar_archive_db_csv_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_csv_export_20210430.zip --> has data until 2021-04-30 Contains the raw CSV files that were sent to NAL from the LTAR sites/stations. ltar_rawcsv_archive.zip --> has data until 2021-04-30
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Tidal and Storm Surge Forecast Service Forecast Periods 4-9 Forecasts. Published by Office of Public Works. Available under the license Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (cc-by-nc-nd).Abstract: This dataset provides the forecast output csv files for the relevant forecast location points for Periods 4 to 9 of the Tidal and Storm Surge Forecasting Service for the Coast of Ireland. They do not cover all points for which a visual forecast was produced and the exact points vary from period to period to reflect development of the system. It is important to emphasise that for period 10, the forecast point naming was changed and the reference used in Periods 4 to 9 may differ from that used in Period 10 for the same point. The csv files were produced daily for each forecast point in the am and extend for a period of 72 hours from their start point (00:00UTC). The files contain forecasts of surge in m, total water level in m OD Malin and astronomic tide in m (relative to Chat Datum) at 15 minute intervals. They also contain details of the forecast point location and the date and time of forecast generation. The forecast data was produced from a hydrodynamic model using best available bathymetry, output from a tidal model as a boundary condition and operational weather forecast data as a forcing condition. Hence the forecast output will be affected by deviations between the weather forecast and actual weather. The Tidal and Storm Surge Forecasting Service was intended to provide information for Local Authorities, and other relevant stakeholders, to enable them to make informed decisions on the management of coastal flood risk. Three forecasts were produced daily, two morning forecasts looking 72 and 144 hours ahead and an evening forecast looking 72 hours ahead.
Lineage: This dataset provides the forecast output csv files for the relevant forecast location points for Periods 1 to 9 of the Tidal and Storm Surge Forecasting Service for the Coast of Ireland. The csv files were produced directly from the forecast model runs and are based on the available weather forecast data at time of run. It should be noted that as each forecast is 72 hours long and new csv files were produced every 24 hours, any single period will be covered by 3 forecast files. It would be expected that forecasts produced closer to the time of interest will be more accurate to reflect improved weather forecasts.
Purpose: The Tidal and Storm Surge Forecasting Service for the Coast of Ireland was intended to provide information for Local Authorities and other relevant stakeholders to enable them to make informed decisions on the management of coastal flood risk. Three forecasts were produced daily, two morning forecasts looking 72 and 144 hours ahead and an evening forecast looking 72 hours ahead. Forecasts were provided at up to 57 forecasting points, 15 National Points around the coast of Ireland and five more detailed local forecasting areas at Dundalk Bay, Wexford Harbour, Cork Harbour, Shannon Estuary and Galway Bay, with the exact numbers at any time depending on the level of model development....
Climate data in India is crucial given the country's diverse topography and climatic zones, which range from the arid deserts of Rajasthan to the wettest places on earth in Himachal Pradesh. The India Meteorological Department (IMD), under the Ministry of Earth Sciences, is the primary agency responsible for meteorological observations, weather forecasting, and seismology. They collect, analyze, and disseminate climate data, which includes parameters like temperature, rainfall, humidity, and wind patterns. This data is vital for sectors like agriculture, water resources, and disaster management in India. Additionally, with the growing concerns of climate change impacting monsoon patterns and causing extreme weather events, the continuous monitoring and analysis of climate data become essential for policy formulation, research, and public awareness. This collection will have datasets that are related to climate like Rainfall, Temperature from various sources etc.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Tidal and Storm Surge Forecast Service Forecast Period 10 Forecasts. Published by Office of Public Works. Available under the license Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (cc-by-nc-nd).Abstract: This dataset provides the forecast output csv files for the 60 forecast location points for Period 10 of the Tidal and Storm Surge Forecasting Service for the Coast of Ireland. The csv files were produced twice daily for each forecast point in the am and pm and extend for a period of 72 hours from their start point (00:00UTC or 12:00 UTC). The files contain forecasts of surge in m, total water level in m OD Malin and astronomic tide in m (relative to Chat Datum) at 15 minute intervals along with error bands. They also contain details of the forecast point location and the date and time of forecast generation. The forecast data was produced from a hydrodynamic model using best available bathymetry, output from a tidal model as a boundary conditions and operational weather forecast data as a forcing condition. Hence the forecast output will be affected by deviations between the weather forecast and actual weather. The Tidal and Storm Surge Forecasting Service was intended to provide information for Local Authorities, and other relevant stakeholders, to enable them to make informed decisions on the management of coastal flood risk. Three forecasts were produced daily, two morning forecasts looking 72 and 144 hours ahead and an evening forecast looking 72 hours ahead, with these being provided graphically on a website.
Lineage: This dataset provides the forecast output csv files for the 60 forecast location points for Period 10 of the Tidal and Storm Surge Forecasting Service for the Coast of Ireland. The csv files were produced directly from the forecast model runs and are based on the available weather forecast data at time of run. It should be noted that as each forecast is 72 hours long and new forecasts were produced every 12 hours, any single period will be covered by 6 forecast files. It would be expected that forecasts produced closer to the time of interest will be more accurate to reflect improved weather forecasts.
Purpose: The Tidal and Storm Surge Forecasting Service for the Coast of Ireland was intended to provide information for Local Authorities and other relevant stakeholders to enable them to make informed decisions on the management of coastal flood risk. Three forecasts were produced daily, two morning forecasts looking 72 and 144 hours ahead and an evening forecast looking 72 hours ahead. Forecasts were provided at 60 forecasting points, 18 National Points around the coast of Ireland and five more detailed local forecasting areas at Dundalk Bay, Wexford Harbour, Cork Harbour, Shannon Estuary and Galway Bay....
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MiBici (translated as MyBike in english) is a public bike service used in my home city of Guadalajara (GDL), Jalisco, Mexico. This service is used in Guadalajara's Metropolitan Area, which has a population of 5,268,642 (as of 2020) distributed in eight main municipalities and a total area of 2,543.13 squared km (981.91 squared mi).
This service has established stations where users can take and return a public bike, all they need is to sign up through MiBici's platform and pay a charge for 1, 3, 7 days, or annual use. After signing up, users get a transport card that can be used at any station in Guadalajara's Metropolitan Area.
MiBici makes their data public and has published it every month since December 2014.
A GitHub repository is also available with the R scripts used to merge, transform, and clean data from 4,496,890 bike trips registered in 2024 and hourly weather data obtained through the Open-meteo API, as well as the combined bikeshare + weather data and standalone hourly weather data available as csv files.
Data was obtained directly from MiBici's public data website. In this site, data is published in CSV files corresponding to individuals month from December 2014 to Febuary 2025. Data from 2024 was downloaded, cleaned, and transformed into a hourly format, as well as merged with hourly weather data.
variable | description | units |
---|---|---|
date | date in yyyy-mm-dd format (i.e. '2024-01-01') | date |
month | month of year (1 = jan, 2 = feb, ... , 12 = dec) | month |
day | day of month | day |
hour | hour of the day in 24 h format starting at 0 | hour |
trip_count | count of hourly bike trips | count |
is_weekend | is the day a weekend? i.e. saturday/sunday (1 = yes, 0 = no) | binary |
is_holiday | is the day a federal holiday in Mexico? (1 = yes, 0 = no) | binary |
apparent_temperature | perceived temperature combining wind chill factor, relative humidity and solar radiation | °C |
wind_speed | wind speed at 10 meters above ground | km/h |
is_day | 1 if the current time has daylight, 0 at night | binary |
temperature | air temperature at 2 meters above ground | °C |
relative_humidity | relative humidity at 2 meters above ground | % |
precipitation | total precipitation (rain, showers, snow) sum of the preceding hour | mm |
weather_code | weather condition as a numeric code. Follow WMO weather interpretation codes (see below) | WMO code |
season | season of the year (winter, spring, summer, fall) | category |
code | description |
---|---|
0 | clear sky |
1 | mainly clear |
2 | partly cloudy |
3 | overcast |
45 | fog |
61 | rain: slight |
63 | rain: moderate |
80 | rain showers: slight |
81 | rain showers: moderate |
95 | thunderstorm |
96 | thunderstorm with slight hail |
I have also cleaned, transformed, and combined all bikeshare data from Dec 2014 to Mar 2024 and published the final dataset (2.51 GB) in Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MOL-TR data set
- format_dataframe.csv: Full traffic count data set of all vehicle types and all stations.
- holidays.csv: Public holidays during the period
- weather.csv: Weather data for each day
- minute_full.csv: Parsed traffic data
- minute_full_station_merged.csv: Parsed and merged traffic data
- locations.csv: Locations of measuring stations. Note that this information has not been thoroughly checked, and some of it is missing.
One can load data sets traffic data, holidays and weather with:
import pandas as pd
traffic = pd.read_csv('format_dataframe.csv')
holidays = pd.read_csv('holidays.csv')
weather = pd.read_csv('weather.csv')
Canadian hourly climate data are available for public access from the ECCC/MSC's National Climate Archive. These are surface weather stations that produce hourly meteorological observations, taken each hour of the day. Only a subset of the total stations found on Environment and Climate Change Canada’s Historical Climate Data Page is shown due to size limitations.The priorities for inclusion are as follows: stations in cities with populations of 10000+, stations that are Regional Basic Climatological Network status and stations with 30+ years of data.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Low-lying coastal highways are susceptible to flooding as the sea level rises. Flooding events already impact some highways, like Highway 37 which runs across the lowlands at the northern end of San Francisco Bay and is crossed by several creeks/rivers. Short-term operational forecasts are required to enable planning for traffic disruption, evacuation, and protection of property and infrastructure. Traditional physically based numerical models have great predictive capability but require extensive datasets and are computationally expensive which limits their ability to do short-term forecasting. Here we develop a data-driven, site-specific method that can be implemented at multiple vulnerable sites throughout San Francisco Bay and other low-lying coastal areas across the State of California. This method is based on direct observations of the water level at the site and is independent of large computer simulations. For this study, we use a relatively simple statistical model (multiple-linear regression) combined with a forecast error correction inspired by an autoregressive moving average method (ARMA) commonly used in time-series forecasting. The model is then used to produce a 4-day water level forecast at 3 stations near HWY 37, Sonoma/Marin County, California.
Methods
The input files for the model are grouped into three different datasets: a training dataset, a water level observations dataset, and a weather forecast dataset. All data within those files are sourced from public data servers.
Training Dataset
Description: This dataset contains the time series of the four parameters that are used to train the model. It consists of hourly observed meteorological data such as wind, atmospheric pressure, and flow for the period of 2019-01-01 to 2022-09-27. The dataset consists of 4 fields: Ocean Wind, Local Wind, Atmospheric Pressure and River flow. The raw data was collected from publicly available sources. The data was downloaded and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. The wind data was transformed from a polar coordinate system of wind speed and direction to principal component x-y vectors. The principal components were oriented so that the alongshore (y-component) is oriented at 60 degrees North for the wind at Gnoss Field and 100 degrees north for the wind at the NDBC buoy. The listed onshore wind is the shorenormal (x-component) for the 2 locations.
Source:
Column Name
Location
Data Type, Unit
Agency Source
Web link to raw data
AtmPres
Buoy 46026
Atmospheric Pressure, mBar
NOAA NDBC
https://www.ndbc.noaa.gov/station_page.php?station=46026
Gnoss_onshorewind
Gnoss Field Airport
Shore-normal component of the wind, m/s
Sonoma County
https://sonoma.onerain.com/site/?site_id=155&site=b4e33d63-e909-4ecd-bb2b-1ee2c587bb00
napa_flow_cfs
Napa River
River flow, cfs
USGS NWIS
https://waterdata.usgs.gov/ca/nwis/uv?site_no=11458000
ocean_onshorewind
Buoy 46026
Shore-normal component of the wind, m/s
NOAA NDBC
https://www.ndbc.noaa.gov/station_page.php?station=46026
Water Level Datasets This dataset consists of three individual files each with 3 fields. The stage_m field is the raw data collected from the water level gauge station, the predicted_m field is the predicted tide as calculated below and the residual_m field is the difference between the two. Description: The raw water level data were collected from 3 stage stations for the period of 2019-01-01 to 2022-09-27 when available. Field stage_m: The data was downloaded, detrended by removing the mean value, and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. Field predicted_m: The predicted tide was calculated using a publicly available Python routine based on a well-documented Matlab routine called Utide (http://www.po.gso.uri.edu/~codiga/utide/utide.htm). Field residual: The residual is the stage-predicted time. It represents the variation of the water level due to non-tidal forcing. Source: The stage data was downloaded from the following sources:
File Name
Location
Data Type, Unit
Agency Source
Web link to raw data
novato_wl_1hr_up.csv
Mouth of Novato Creek
Stage, m
Marin Co
https://marin.onerain.com/site/?site_id=16808&site=a88e57c5-06b1-4855-a65c-92ef0063e6bb
rowland_wl_1hr.csv
Novato Creek at Rowland Bridge
Stage, m
Marin Co
https://marin.onerain.com/site/?site_id=16809&site=82b05ca8-3c86-49cc-9660-63ca3abd3e35
petaluma_wl_1hr.csv
Petaluma River at Horse Ranch
Stage, m
UC Davis, BML
https://coastalocean.ucdavis.edu/ocean-observing/hwy37
Weather Forecast Datasets This dataset is the weather forecast for the 4 parameters used by the model. Description: This dataset contains forecasted meteorological data as obtained from NOAA data servers. The atmospheric pressure forecast was obtained from openweathermap, an open-source weather forecast app. Source:
Column Name
Location
Data Type, Unit
Agency Source
Web link to raw data
AtmPres
Buoy 46026
Atmospheric Pressure, mBar
-
Gnoss_onshorewind
Gnoss Field Airport
Shore-normal component of the wind, m/s
NOAA NWS
https://www.weather.gov/documentation/services-web-api
napa_flow_cfs
Napa River
River flow, cfs
NOAA AHPS
https://water.weather.gov/ahps2/hydrograph.php?gage=apcc1&wfo=mtr
ocean_onshorewind
Buoy 46026
Shore-normal component of the wind, m/s
NOAA NWS
The NOAA Cooperative Observer Program (COOP) 15-Minute Precipitation Data consists of quality controlled precipitation amounts, which are measurements of 15 minute accumulation of precipitation, including rain and snow for approximately 2,000 observing stations around the country, and several U.S. territories in the Caribbean and Pacific operated or managed by the NOAA National Weather Service (NWS). Stations are primary, secondary, or cooperative observer sites that have the capability to measure precipitation at 15 minute intervals. This dataset contains 15-minute precipitation data (reported 4 times per hour, if precipitation occurred) for U.S. stations along with selected non-U.S. stations in U.S. territories and associated nations. It includes major city locations and many small town locations. Daily total precipitation is also included as part of the data record. The dataset period of record is from May 1970 to December 2013. The dataset is archived by the NOAA National Centers for Environmental Information (NCEI).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains messages published and replies received by government weather and climate authorities on the X (former Twitter) social media platforms. The data comprises government weather and climate authorities for the Brazilian cities of São Paulo, Rio de Janeiro, Belo Horizonte, Porto Alegre, and Belém. Government weather and climate authorities are city hall departments or sectors responsible for informing and keeping the population updated about weather events.
Publications made by the authority and replies published by citizens to these publications are observed. This data supports the study on the interaction dynamics between the climate authority and citizens over time.
Data Structure
Two files are available publications.csv and replies.csv.
Each line in the publications' file (publications.csv) refers to an authority publication/tweet. For each publication, it is stored the public authority's unique Twitter identifier (AUTHORITY_ID), the tweet unique identifier (TWEET_ID), the Unix timestamp that indicates when it was published (TIMESTAMP), and the text of the publication (TEXT).
Each line in the replies file (replies.csv) is a reply from a citizen to an authority. For each reply, it is stored the authority's unique Twitter identifier (AUTHORITY_ID), the unique identifier of the authority's tweet being replied to (TWEET_ID), the replier masked unique Twitter identifier (AUTHOR_ID), and the reply Unix timestamp (TIMESTAMP) that indicates when it was published.
All data were collected through the X's application programming interface (API) provided to scientific researchers. Publications and replies were posted by users (authorities and citizens) with public visibility.
Data Content
The dataset covers 1-year observation period, starting on July 17, 2021, and ending on June 16, 2022. It contains a total of 10,229 publications and 5,471 replies. The observed authorities are as follows:
City Authority name X handle AUTHORITY_ID
São Paulo Centro de Gerenciamento de Emergências Climáticas da Prefeitura de SP @cge_sp 268407434
Rio de Janeiro Sistema de Alerta localizado no Centro de Operações do Rio (COR) @alertario 87487749
Belo Horizonte Defesa Civil de Belo Horizonte @defesacivilbh 837731966
Porto Alegre Defesa Civil Porto Alegre @defesacivilpoa 1037420896473022466
Belém Defesa Civil de Belém @defesacivilbel 1346501728632500225
As weather and climate authorities are government bodies, the whole content of their publications is of public interest according to Brazilian law. Thus, the text messages in their publications on social media are in the public domain and are stored in this dataset. As the data structure describes, text messages of citizens' replies are not stored. According to the terms of use of the X platform, citizen text messages cannot be publicly stored outside the X platform. Such text messages are public on that platform, and, for reproductivity, they can be recollected using the platform web page or API informing the TWEET_ID stored in this dataset.
Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches) Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex Integrated Surface Hourly (ISH). Historical data are generally available for 1929 to the present, with data from 1973 to the present being the most complete. For some periods, one or more countries' data may not be available due to data restrictions or communications problems. In deriving the summary of day data, a minimum of 4 observations for the day must be present (allows for stations which report 4 synoptic observations/day). Since the data are converted to constant units (e.g, knots), slight rounding error from the originally reported values may occur (e.g, 9.9 instead of 10.0). The mean daily values described below are based on the hours of operation for the station. For some stations/countries, the visibility will sometimes 'cluster' around a value (such as 10 miles) due to the practice of not reporting visibilities greater than certain distances. The daily extremes and totals--maximum wind gust, precipitation amount, and snow depth--will only appear if the station reports the data sufficiently to provide a valid value. Therefore, these three elements will appear less frequently than other values. Also, these elements are derived from the stations' reports during the day, and may comprise a 24-hour period which includes a portion of the previous day. The data are reported and summarized based on Greenwich Mean Time (GMT, 0000Z - 2359Z) since the original synoptic/hourly data are reported and based on GMT.