100+ datasets found

t
ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...
researchdata.tuwien.ac.at
zip
Updated Jun 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48436/3fcxr-cde10
Dataset updated
Jun 6, 2025
Dataset provided by
TU Wien
Authors
Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

Dataset paper (public preprint)

A description of this dataset, including the methodology and validation results, is available at:

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.

Abstract

ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

Summary

Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling

Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology

Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.

More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

Programmatic Download

You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

#!/bin/bash

# Set download directory
DOWNLOAD_DIR=~/Downloads

base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done

Data details

The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

Data Variables

Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).

sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.

sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)

sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also provided for cases where an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.

gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (1), and where the interpolated (smoothed) values are used instead (0) in the 'sm' field.

frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

Additional information for each variable is given in the netCDF attributes.

Version Changelog

Changes in v9.1r1 (previous version was v09.1):

This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

Software to open netCDF files

These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

https://github.com/pydata/xarray" target="_blank" rel="noopener">Xarray (python)

https://unidata.github.io/netcdf4-python/" target="_blank" rel="noopener">netCDF4 (python)

https://github.com/TUW-GEO/esa_cci_sm">esa_cci_sm (python)

Similar tools exists for other programming languages (Matlab, R, etc.)

Software packages and GIS tools can open netCDF files, e.g. CDO, NCO, QGIS, ArCGIS

You can also use the GUI software Panoply to view the contents of each file

References

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.

Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869

Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020

Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

Related Records

The following records are all part of the Soil Moisture Climate Data Records from satellites community

1
ESA CCI SM MODELFREE Surface Soil Moisture Record
<a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"
Z
National Weather Service Coded Surface Bulletins, 2003- (netCDF format)
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Biard, James C (2020). National Weather Service Coded Surface Bulletins, 2003- (netCDF format) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2651360
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Biard, James C
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains the Coded Surface Bulletin (CSB) dataset reformatted as netCDF-4 files. The CSB dataset is a collection of ASCII files containing the locations of weather fronts, troughs, high pressure centers, and low pressure centers as determined by National Weather Service meteorologists at the Weather Prediction Center (WPC) during the surface analysis they do every three hours. Each bulletin is broadcast on the NOAAPort service, and has been available since 2003.

Each netCDF file contains one year of CSB fronts data represented as spatial map data grids. The times and geospatial locations for the data grid cells are also included. The front data is stored in a netCDF variable with dimensions (time, front type, y, x), where x and y are geospatial dimensions. There is a 2D geospatial data grid for each time step for each of the 4 front types—cold, warm, stationary, and occluded. The front polylines from the CSB dataset are rasterized into the appropriate data grids. Each file conforms to the Climate and Forecast Metadata Conventions.

There are two large groupings of the CSB netCDF files. One group uses a data grid based on the North American Regional Reanalysis (NARR) grid, which is a Lambert Conformal Conic projection coordinate reference system (CRS) centered over North America. The NARR grid is quite close the the spatial range of data displayed on the WPC workstations used to perform surface analysis and identify front locations. The native NARR grid has grid cells which are 32 km on each side. Our grid covers the same extents with cells that are 96 km on each side.

The other group uses a 1° latitude/longitude data grid centered over North America with extents 171W – 31W / 10N – 77 N. The files in this group are identified by the name MERRA2, because they were used with data from the NASA MERRA-2 dataset, which uses a latitude/longitude data grid.

There are a number of files within each group. The files all follow the naming convention codsus_[masked]_.nc, where [masked] indicates that the presence of the word masked is optional and is either merra2-1deg or narr-96km. The element is either the word mask or the sequence wide_, where is the front width and is the year for the data stored in the file.

The codsus_mask.nc file is a file containing a single data grid that delineates the envelope of the geospatial region where there are, on average, 40 or more front crossing of any type per year. The WPC meteorologists don't attempt to provide equal levels of attention to every grid cell displayed on their workstations. The files of the form codsus_masked_wide_.nc have all had the mask described above applied to exclude parts of fronts that extend past the envelope. The files of the form codsus_wide_.nc have no masking applied.

The wide portion of the file names takes two forms—1wide and 3wide. The fronts in the1wide files were rasterized by drawing the front polylines with a width of one grid cell. The fronts in the 3wide files were rasterized by drawing the front polylines with a width of 3 grid cells.

Within each grid group, there are five subsets of files:

codsus_masked_1wide_.nc

codsus_masked_3wide_.nc

codsus_1wide_.nc

codsus_3wide_.nc

codsus_mask.nc

The primary source for this dataset is an internal archive maintained by personnel at the WPC and provided to the author. It is also provided at DOI 10.5281/zenodo.2642801. Some bulletins missing from the WPC archive were filled in with data acquired from the Iowa Environmental Mesonet.
NOAA Global Forecast System (GFS) netCDF Formatted Data
registry.opendata.aws
Updated Mar 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA (2025). NOAA Global Forecast System (GFS) netCDF Formatted Data [Dataset]. https://registry.opendata.aws/noaa-oar-arl-nacc-pds/
Explore at:
Dataset updated
Mar 5, 2025
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Description
The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The GFS data files stored here can be immediately used for OAR/ARL’s NOAA-EPA Atmosphere-Chemistry Coupler Cloud (NACC-Cloud) tool, and are in a Network Common Data Form (netCDF), which is a very common format used across the scientific community. These particular GFS files contain a comprehensive number of global atmosphere/land variables at a relatively high spatiotemporal resolution (approximately 13x13 km horizontal, vertical resolution of 127 levels, and hourly), are not only necessary for the NACC-Cloud tool to adequately drive community air quality applications (e.g., U.S. EPA’s Community Multiscale Air Quality model; https://www.epa.gov/cmaq), but can be very useful for a myriad of other applications in the Earth system modeling communities (e.g., atmosphere, hydrosphere, pedosphere, etc.). While many other data file and record formats are indeed available for Earth system and climate research (e.g., GRIB, HDF, GeoTIFF), the netCDF files here are advantageous to the larger community because of the comprehensive, high spatiotemporal information they contain, and because they are more scalable, appendable, shareable, self-describing, and community-friendly (i.e., many tools available to the community of users). Out of the four operational GFS forecast cycles per day (at 00Z, 06Z, 12Z and 18Z) this particular netCDF dataset is updated daily (/inputs/yyyymmdd/) for the 12Z cycle and includes 24-hr output for both 2D (gfs.t12z.sfcf$0hh.nc) and 3D variables (gfs.t12z.atmf$0hh.nc).

Also available are netCDF formatted Global Land Surface Datasets (GLSDs) developed by Hung et al. (2024). The GLSDs are based on numerous satellite products, and have been gridded to match the GFS spatial resolution (~13x13 km). These GLSDs contain vegetation canopy data (e.g., land surface type, vegetation clumping index, leaf area index, vegetative canopy height, and green vegetation fraction) that are supplemental to and can be combined with the GFS meteorological netCDF data for various applications, including NOAA-ARL's canopy-app. The canopy data variables are climatological, based on satellite data from the year 2020, combined with GFS meteorology for the year 2022, and are created at a daily temporal resolution (/inputs/geo-files/gfs.canopy.t12z.2022mmdd.sfcf000.global.nc)
n
GRACE MONTHLY LAND WATER MASS GRIDS NETCDF RELEASE 5.0
podaac.jpl.nasa.gov
data.globalchange.gov
html
Updated Aug 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PO.DAAC (2024). GRACE MONTHLY LAND WATER MASS GRIDS NETCDF RELEASE 5.0 [Dataset]. http://doi.org/10.5067/TELND-NC005
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.5067/TELND-NC005
Dataset updated
Aug 23, 2024
Dataset provided by
PO.DAAC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 1, 2002 - Present
Variables measured
GRAVITY
Description
The twin satellites of the Gravity Recovery and Climate Experiment (GRACE), launched in March of 2002, are making detailed monthly measurements of Earth's gravity field changes. These observations can detect regional mass changes of Earth's water reservoirs over land, ice and oceans. GRACE measures gravity variations by relating it to the distance variations between the two satellites, which fly in the same orbit, separated by about 240 km at an altitude of ~450 km. The monthly land mass grids contain terrestrial water storage anomalies (in aquifers, river basins, etc.) from GRACE time-variable gravity data relative to a time-mean. The storage anomalies are given in 'equivalent water thickness' (in NetCDF format). The time coverage for the monthly grids are determined by GRACE months. For the list of GRACE month dates visit http://grace.jpl.nasa.gov/data/grace-months/ . For information please visit http://grace.jpl.nasa.gov/data/get-data/monthly-mass-grids-land/ .
TIGER/Line Shapefile, 2022, County, Robeson County, NC, Feature Names...
catalog.data.gov
datasets.ai
Updated Jan 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Spatial Data Collection and Products Branch (Point of Contact) (2024). TIGER/Line Shapefile, 2022, County, Robeson County, NC, Feature Names Relationship File [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2022-county-robeson-county-nc-feature-names-relationship-file
Explore at:
Dataset updated
Jan 28, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Robeson County, North Carolina
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Feature Names Relationship File (FEATNAMES.dbf) contains a record for each feature name and any attributes associated with it. Each feature name can be linked to the corresponding edges that make up that feature in the All Lines Shapefile (EDGES.shp), where applicable to the corresponding address range or ranges in the Address Ranges Relationship File (ADDR.dbf), or to both files. Although this file includes feature names for all linear features, not just road features, the primary purpose of this relationship file is to identify all street names associated with each address range. An edge can have several feature names; an address range located on an edge can be associated with one or any combination of the available feature names (an address range can be linked to multiple feature names). The address range is identified by the address range identifier (ARID) attribute, which can be used to link to the Address Ranges Relationship File (ADDR.dbf). The linear feature is identified by the linear feature identifier (LINEARID) attribute, which can be used to relate the address range back to the name attributes of the feature in the Feature Names Relationship File or to the feature record in the Primary Roads, Primary and Secondary Roads, or All Roads Shapefiles. The edge to which a feature name applies can be determined by linking the feature name record to the All Lines Shapefile (EDGES.shp) using the permanent edge identifier (TLID) attribute. The address range identifier(s) (ARID) for a specific linear feature can be found by using the linear feature identifier (LINEARID) from the Feature Names Relationship File (FEATNAMES.dbf) through the Address Range / Feature Name Relationship File (ADDRFN.dbf).
U
CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions
dataverse-staging.rdmc.unc.edu
datasearch.gesis.org
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNC Dataverse (2019). CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions [Dataset]. http://doi.org/10.15139/S3/XDYYB9
Explore at:
Unique identifier
https://doi.org/10.15139/S3/XDYYB9
Dataset updated
Dec 12, 2019
Dataset provided by
UNC Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
Data Summary: US states grid mask file and NOAA climate regions grid mask file, both compatible with the 12US1 modeling grid domain. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data. These files can be used with CMAQ-ISAMv5.3 to track state- or region-specific emissions. See Chapter 11 and Appendix B.4 in the CMAQ User's Guide for further information on how to use the ISAM control file with GRIDMASK files. The files can also be used for state or region-specific scaling of emissions using the CMAQv5.3 DESID module. See the DESID Tutorial and Appendix B.4 in the CMAQ User's Guide for further information on how to use the Emission Control File to scale emissions in predetermined geographical areas. File Location and Download Instructions: Link to GRIDMASK files Link to README text file with information on how these files were created File Format: The grid mask are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python). File descriptions These GRIDMASK files can be used with the 12US1 modeling grid domain (grid origin x = -2556000 m, y = -1728000 m; N columns = 459, N rows = 299). GRIDMASK_STATES_12US1.nc - This file containes 49 variables for the 48 states in the conterminous U.S. plus DC. Each state variable (e.g., AL, AZ, AR, etc.) is a 2D array (299 x 459) providing the fractional area of each grid cell that falls within that state. GRIDMASK_CLIMATE_REGIONS_12US1.nc - This file containes 9 variables for 9 NOAA climate regions based on the Karl and Koss (1984) definition of climate regions. Each climate region variable (e.g., CLIMATE_REGION_1, CLIMATE_REGION_2, etc.) is a 2D array (299 x 459) providing the fractional area of each grid cell that falls within that climate region. NOAA Climate regions: CLIMATE_REGION_1: Northwest (OR, WA, ID) CLIMATE_REGION_2: West (CA, NV) CLIMATE_REGION_3: West North Central (MT, WY, ND, SD, NE) CLIMATE_REGION_4: Southwest (UT, AZ, NM, CO) CLIMATE_REGION_5: South (KS, OK, TX, LA, AR, MS) CLIMATE_REGION_6: Central (MO, IL, IN, KY, TN, OH, WV) CLIMATE_REGION_7: East North Central (MN, IA, WI, MI) CLIMATE_REGION_8: Northeast (MD, DE, NJ, PA, NY, CT, RI, MA, VT, NH, ME) + Washington, D.C.* CLIMATE_REGION_9: Southeast (VA, NC, SC, GA, AL, GA) *Note that Washington, D.C. is not included in any of the climate regions on the website but was included with the “Northeast” region for the generation of this GRIDMASK file.
TIGER/Line Shapefile, 2022, County, Wake County, NC, Feature Names...
datasets.ai
s.cnmilf.com
+1more
55, 57
Updated Jan 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau, Department of Commerce (2024). TIGER/Line Shapefile, 2022, County, Wake County, NC, Feature Names Relationship File [Dataset]. https://datasets.ai/datasets/tiger-line-shapefile-2022-county-wake-county-nc-feature-names-relationship-file
Explore at:
55, 57Available download formats
Dataset updated
Jan 27, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
U.S. Census Bureau, Department of Commerce
Area covered
North Carolina, Wake County
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Feature Names Relationship File (FEATNAMES.dbf) contains a record for each feature name and any attributes associated with it. Each feature name can be linked to the corresponding edges that make up that feature in the All Lines Shapefile (EDGES.shp), where applicable to the corresponding address range or ranges in the Address Ranges Relationship File (ADDR.dbf), or to both files. Although this file includes feature names for all linear features, not just road features, the primary purpose of this relationship file is to identify all street names associated with each address range. An edge can have several feature names; an address range located on an edge can be associated with one or any combination of the available feature names (an address range can be linked to multiple feature names). The address range is identified by the address range identifier (ARID) attribute, which can be used to link to the Address Ranges Relationship File (ADDR.dbf). The linear feature is identified by the linear feature identifier (LINEARID) attribute, which can be used to relate the address range back to the name attributes of the feature in the Feature Names Relationship File or to the feature record in the Primary Roads, Primary and Secondary Roads, or All Roads Shapefiles. The edge to which a feature name applies can be determined by linking the feature name record to the All Lines Shapefile (EDGES.shp) using the permanent edge identifier (TLID) attribute. The address range identifier(s) (ARID) for a specific linear feature can be found by using the linear feature identifier (LINEARID) from the Feature Names Relationship File (FEATNAMES.dbf) through the Address Range / Feature Name Relationship File (ADDRFN.dbf).
Outputs from a Regional Ocean Modeling System (ROMS) data assimilative...
seanoe.org
nc
Updated Nov 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Wilkin; Julia Levin (2021). Outputs from a Regional Ocean Modeling System (ROMS) data assimilative reanalysis (version DopAnV2R3-ini2007) of ocean circulation in the Mid-Atlantic Bight and Gulf of Maine for 2007-2020 [Dataset]. http://doi.org/10.17882/86286
Explore at:
ncAvailable download formats
Unique identifier
https://doi.org/10.17882/86286
Dataset updated
Nov 30, 2021
Dataset provided by
SEANOE
Authors
John Wilkin; Julia Levin
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Jan 1, 2007 - Aug 30, 2021
Area covered

Description
a hindcast reanalysis of ocean circulation in the mid-atlantic bight and gulf of maine has been computed using the regional ocean modeling system (roms) with 4-dimensional variational (4d-var) assimilation of data from satellites, land-based ocean surface current measuring radar, and all available in situ observations from the maracoos (maracoos.org) and neracoos (neracoos.org) regional associations of the u.s. integrated ocean observing system (ioos). this reanalysis is version dopanv2r3-ini2007 (version 2, release 3, initialized january 2007).the analysis covers the period 2-jan-2007 to 30-aug-2021 on a 7-km horizonal grid with 40 vertical terrain-following s-coordinate levels. ocean state variables computed are sea level, velocity, temperature, and salinity. air-sea fluxes of heat and momentum, and surface and bottoms stresses, are included.results are provided on the roms model native 3-dimensional grid as (i) 1-hourly interval snapshots (roms “history” files), (ii) 1-day averages, (iii) monthly averages, (iv) yearly averages, and (v) ensemble monthly averages (i.e., the mean of all days in the same month from all years). the output files are in netcdf format and data and metadata follow cf-1.4 conventions for the description of coordinates and variables.the files uploaded here are examples of one time record from each of these 5 collections. outputs for the full reanalysis, which comprises 6.8 terabytes fo data, are made available for download via a thredds (thematic real-time environmental distributed data services) web service to facilitate user geospatial or temporal sub-setting.the thredds catalog urls and example filenames available here, for the respective collections, are: 1-hourly history snapshots 2007-01-02 01:00 through 2021-08-31 00:00: ttps://tds.marine.rutgers.edu/thredds/roms/doppio/catalog.html?dataset=dopanv2r3-ini2007_da_history example file uploaded here is his_dopanv2r3_20140516t0100.nc for 2014-05-06 01:00 24-hour averages 2007-01-02 12:00 through 2021-08-30 12:00 https://tds.marine.rutgers.edu/thredds/roms/doppio/catalog.html?dataset=dopanv2r3-ini2007_da_average example file uploaded here is avg_dopanv2r3_20140516t1200.nc for 2014-05-06 monthly averages 2007-01-17 through 2020-12-16 https://tds.marine.rutgers.edu/thredds/roms/doppio/catalog.html?dataset=dopanv2r3-ini2007_da_monthly_averages example file uploaded here is mon_dopanv2r3_201405.nc for 2014-05 yearly averages 2007 through 2020: https://tds.marine.rutgers.edu/thredds/roms/doppio/catalog.html?dataset=dopanv2r3-ini2007_da_yearly_averages example file uploaded here is year_dopanv2r3_2014.nc for 2014 monthly ensemble averages: https://tds.marine.rutgers.edu/thredds/roms/doppio/catalog.html?dataset=dopanv2r3-ini2007_da_monthly_ensemble_means example file uploaded here is ensmon_dopanv2r3_05.nc for maythe underlying ocean circulation model configuration is described by lopez et al (2020). the observations that are assimilated and the error hypotheses and other aspects of the 4d-var assimilation implementation are described by levin et al. (2020; 2021).lópez, a. g., j. l. wilkin and j. c. levin, (2020) doppio – a roms (v3.6)-based circulation model for the mid-atlantic bight and gulf of maine: configuration and comparison to integrated coastal observing network observations, geosci. model dev., 13, 3709–3729, doi: 10.5194/gmd-13-3709-2020levin, j., h. arango, b. laughlin, e. hunter, j. wilkin and a. moore, (2020), observation impacts on the mid-atlantic bight front and cross-shelf transport in 4d-var ocean state estimates, part i – multiplatform analysis, ocean modelling, 156, 101721, doi: 10.1016/j.ocemod.2020.101721levin, j., h. g. arango, b. laughlin, j. wilkin and a. m. moore, (2021), the impact of remote sensing observations on cross-shelf transport estimates from 4d-var analyses of the mid-atlantic bight, advances in space research, 68, 553-570, doi: 10.1016/j.asr.2019.09.012
r
NORTH CAROLINA
redivis.com
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCLA Library (2024). NORTH CAROLINA [Dataset]. https://redivis.com/datasets/ey62-9t0gpyvbg/usage
Explore at:
Dataset updated
May 8, 2024
Dataset authored and provided by
UCLA Library
Description
The table NORTH CAROLINA is part of the dataset L2 Voter File, available at https://redivis.com/datasets/ey62-9t0gpyvbg. It contains 169832561 rows across 38 variables.
2023 Cartographic Boundary File (SHP), Place for North Carolina, 1:500,000
catalog.data.gov
datasets.ai
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2024). 2023 Cartographic Boundary File (SHP), Place for North Carolina, 1:500,000 [Dataset]. https://catalog.data.gov/dataset/2023-cartographic-boundary-file-shp-place-for-north-carolina-1-500000
Explore at:
Dataset updated
May 16, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
North Carolina
Description
The 2023 cartographic boundary shapefiles are simplified representations of selected geographic areas from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). These boundary files are specifically designed for small-scale thematic mapping. When possible, generalization is performed with the intent to maintain the hierarchical relationships among geographies and to maintain the alignment of geographies within a file set for a given year. Geographic areas may not align with the same areas from another year. Some geographies are available as nation-based files while others are available only as state-based files. The cartographic boundary files include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The generalized boundaries of most incorporated places in this file are based on those as of January 1, 2023, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The generalized boundaries of all CDPs are based on those delineated or updated as part of the the 2023 BAS or the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
m
BASIC Composite Ozone Time-Series Data
data.mendeley.com
Updated Sep 15, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Justin Alsing (2019). BASIC Composite Ozone Time-Series Data [Dataset]. http://doi.org/10.17632/2mgx2xzzpk.3
Explore at:
Unique identifier
https://doi.org/10.17632/2mgx2xzzpk.3
Dataset updated
Sep 15, 2019
Authors
Justin Alsing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BAyeSian Integrated and Consolidated (BASIC) composite ozone time-series dataset built from a Bayesian joint self-calibration analysis of multiple composite ozone datasets. The construction of the BASIC composite is described in detail in the paper:

Ball et al, Reconciling differences in stratospheric ozone composites, ACP (2017).

If you use the BASIC dataset, please cite both the DOI for this data page and Ball et al 2017 (ACP).

The netCDF file includes variables for time, pressure and latitude giving the Julian dates* and pressure and latitude grid respectively. The ozone time-series data is given in the variable o3[time, pressure, latitude] and associated (time-varying) 1-sigma uncertainties are given in sigma_o3[time, pressure, latitude].

BASIC_V1_swooshV2.6_gozcardsV1.0_sbuvmodV8.6_sbuvmer.nc is built from SWOOSH v2.6, GOZCARDS v1.0, SBUV-MOD v8.6 and SBUV-MER (as described in Tummon et al 2015). This corresponds to the BASIC composite presented in Ball et al 2017 (ACP); the data runs up until Dec 2012.

BASIC_V1_swooshV2.6_gozcardsV2.20.nc is built from SWOOSH v2.6 and GOZCARDS v2.20; the updated data runs up until Dec 2018. This data was used in the revised version of Ball et al, Continuous decline in lower stratospheric ozone offsets ozone layer recovery, 2017 (ACPD) (referred to as merged-swoosh/gozcards in that paper).

*00:00:00.0 on 1/1/1980=2444239.5
Catchment and river network netCDF files for river routing using mizuRoute:...
zenodo.org
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Cortes-Salazar; Nicolas Cortes-Salazar; Nicolás Vásquez; Nicolás Vásquez; Pablo Mendoza; Pablo Mendoza (2025). Catchment and river network netCDF files for river routing using mizuRoute: study case for 198 catchments in continental Chile [Dataset]. http://doi.org/10.5281/zenodo.15691109
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15691109
Dataset updated
Jun 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicolas Cortes-Salazar; Nicolas Cortes-Salazar; Nicolás Vásquez; Nicolás Vásquez; Pablo Mendoza; Pablo Mendoza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 18, 2025
Area covered
Chile
Description
This dataset contains the netCDF files used to run mizuRoute across continental Chile.
Each folder includes the following files:

[Catchment_BNACode]_grid.gpkg: the VIC model grid used for the catchment.

mapping_[Catchment_BNACode].nc: the mapping file that links the VIC grid to the GIS features.

ntopo_[Catchment_BNACode].nc: a river network netCDF file containing river reach-to-reach topology, reach-to-HRU topology, and physical parameters for both rivers and HRUs.

This research was funded by the Fondecyt Project 11200142 “Robust estimates of current and future water resources across a hydroclimatic gradient in Chile” (Principal Investigator: Pablo A. Mendoza).

The use of these files requires citing this dataset, and the paper that describes the approach used to produce the data:
Cortés-Salazar, N., Vásquez, N., Mizukami, N., Mendoza, P. A., & Vargas, X. (2023). To what extent does river routing matter in hydrological modeling?. Hydrology and Earth System Sciences, 27(19), 3505-3524. (doi.org/10.5194/hess-27-3505-2023).
TIGER/Line Shapefile, 2023, County, Bertie County, NC, Address Ranges...
datasets.ai
catalog.data.gov
55, 57
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau, Department of Commerce, TIGER/Line Shapefile, 2023, County, Bertie County, NC, Address Ranges Relationship File [Dataset]. https://datasets.ai/datasets/tiger-line-shapefile-2023-county-bertie-county-nc-address-ranges-relationship-file
Explore at:
55, 57Available download formats
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
U.S. Census Bureau, Department of Commerce
Area covered
Bertie County, North Carolina
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Relationship File (ADDR.dbf) contains the attributes of each address range. Each address range applies to a single edge and has a unique address range identifier (ARID) value. The edge to which an address range applies can be determined by linking the address range to the All Lines Shapefile (EDGES.shp) using the permanent topological edge identifier (TLID) attribute. Multiple address ranges can apply to the same edge since an edge can have multiple address ranges. Note that the most inclusive address range associated with each side of a street edge already appears in the All Lines Shapefile (EDGES.shp). The TIGER/Line Files contain potential address ranges, not individual addresses. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
TIGER/Line Shapefile, 2023, County, Alamance County, NC, Address...
datasets.ai
catalog.data.gov
55, 57
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau, Department of Commerce, TIGER/Line Shapefile, 2023, County, Alamance County, NC, Address Range-Feature Name Relationship File [Dataset]. https://datasets.ai/datasets/tiger-line-shapefile-2023-county-alamance-county-nc-address-range-feature-name-relationship-fil
Explore at:
55, 57Available download formats
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
U.S. Census Bureau, Department of Commerce
Area covered
Alamance County, North Carolina
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Address Range / Feature Name Relationship File (ADDRFN.dbf) contains a record for each address range / linear feature name relationship. The purpose of this relationship file is to identify all street names associated with each address range. An edge can have several feature names; an address range located on an edge can be associated with one or any combination of the available feature names (an address range can be linked to multiple feature names). The address range is identified by the address range identifier (ARID) attribute that can be used to link to the Address Ranges Relationship File (ADDR.dbf). The linear feature name is identified by the linear feature identifier (LINEARID) attribute that can be used to link to the Feature Names Relationship File (FEATNAMES.dbf).
d
l477nc.m77t - MGD77 data file for Geophysical data from field activity...
datasets.ai
data.usgs.gov
+6more
55
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2024). l477nc.m77t - MGD77 data file for Geophysical data from field activity L-4-77-NC in Northern California from 05/10/1977 to 05/21/1977 [Dataset]. https://datasets.ai/datasets/l477nc-m77t-mgd77-data-file-for-geophysical-data-from-field-activity-l-4-77-nc-in-north-21
Explore at:
55Available download formats
Dataset updated
Sep 28, 2024
Dataset authored and provided by
Department of the Interior
Area covered
Northern California, California
Description
Single-beam bathymetry, gravity, and magnetic data along with DGPS navigation data was collected as part of field activity L-4-77-NC in Northern California from 05/10/1977 to 05/21/1977, http://walrus.wr.usgs.gov/infobank/l/l477nc/html/l-4-77-nc.meta.html These data are reformatted from space-delimited ASCII text files located in the Coastal and Marine Geology Program (CMGP) InfoBank field activity catalog at http://walrus.wr.usgs.gov/infobank/l/l477nc/html/l-4-77-nc.bath.html, http://walrus.wr.usgs.gov/infobank/l/l477nc/html/l-4-77-nc.grav.html, and http://walrus.wr.usgs.gov/infobank/l/l477nc/html/l-4-77-nc.mag.html into MGD77T format provided by the NOAA's National Geophysical Data Center(NGDC). The MGD77T format includes a header (documentation) file (.h77t) and a data file (.m77t). More information regarding this format can be found in the publication listed in the Cross_reference section of this metadata file.
u
Global temperature data from NCAR CSM - Dataset - NIASRA
hpc.niasra.uow.edu.au
Updated Dec 21, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). Global temperature data from NCAR CSM - Dataset - NIASRA [Dataset]. https://hpc.niasra.uow.edu.au/ckan/dataset/global-temperature-data
Explore at:
Dataset updated
Dec 21, 2014
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description: There are 21 files in the directory. The file “TREFHT_1980-1999.nc” (in netcdf format) contains the 2-meter air temperature (128 lon x 64 lat x 240 months) from 1980-1999. This file has also been transferred into 20 ASCII files, x1-x20 for years 1980-1999, respectively. There are 128x64x12 elements in each ASCII file, which can be input as an array in R by array(“x1”,dim=c(128,64,12)). The data were generated from the NCAR Climate System Model. A part of this data set was used in Shen et al. (2002). Reference:
T
1-km monthly precipitation dataset for China (1901-2023)
tpdc.ac.cn
data.tpdc.ac.cn
zip
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shouzhang PENG (2024). 1-km monthly precipitation dataset for China (1901-2023) [Dataset]. http://doi.org/10.5281/zenodo.3114194
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3114194
Dataset updated
Jul 18, 2024
Dataset provided by
TPDC
Authors
Shouzhang PENG
Area covered

Description
This dataset is the monthly precipitation data of China, with a spatial resolution of 0.0083333 ° (about 1km) and a time range of 1901.1-2023.12. The data format is NETCDF, i.e.. Nc format. This dataset is generated in China through the Delta spatial downscaling scheme based on the global 0.5 ° climate dataset released by CRU and the global high-resolution climate dataset released by WorldClim. In addition, 496 independent meteorological observation point data are used for verification, and the verification results are reliable. This data set covers the main land areas in China (including Hong Kong, Macao and Taiwan), excluding islands and reefs in the South China Sea. In order to facilitate storage, the data are all int16 type and stored in nc files, with precipitation units of 0.1mm. NC data can be mapped using ArcMAP software; Matlab software can also be used for extraction processing. Matlab has released the function to read and store nc files. The read function is ncread, and switch to the nc file storage folder. The statement is expressed as: ncread ('XXX.nc ',' var ', [i j t], [leni lenj lent]), where XXX.nc is the file name, and is the string required' '; Var is from XXX The variable name read in NC. If it is a string, '' is required; i. J and t are the starting row, column and time of the read data respectively, and leni, lenj and lent i are the length of the read data in the row, column and time dimensions respectively. In this way, this function can be used to read in any region and any time period in the study area. There are many commands about NC data in the help of Matlab, which can be viewed. WGS84 is recommended for data coordinate system.
TIGER/Line Shapefile, 2023, County, Forsyth County, NC, Address Ranges...
datasets.ai
catalog.data.gov
55, 57
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau, Department of Commerce (2023). TIGER/Line Shapefile, 2023, County, Forsyth County, NC, Address Ranges Relationship File [Dataset]. https://datasets.ai/datasets/tiger-line-shapefile-2023-county-forsyth-county-nc-address-ranges-relationship-file
Explore at:
57, 55Available download formats
Dataset updated
Dec 15, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
U.S. Census Bureau, Department of Commerce
Area covered
Forsyth County, North Carolina
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Relationship File (ADDR.dbf) contains the attributes of each address range. Each address range applies to a single edge and has a unique address range identifier (ARID) value. The edge to which an address range applies can be determined by linking the address range to the All Lines Shapefile (EDGES.shp) using the permanent topological edge identifier (TLID) attribute. Multiple address ranges can apply to the same edge since an edge can have multiple address ranges. Note that the most inclusive address range associated with each side of a street edge already appears in the All Lines Shapefile (EDGES.shp). The TIGER/Line Files contain potential address ranges, not individual addresses. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
T
Dataset on the evolution pattern and development trend of the arid...
data.tpdc.ac.cn
tpdc.ac.cn
zip
Updated Nov 18, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhisheng AN (2008). Dataset on the evolution pattern and development trend of the arid environment since 3600 kyr BP in Western China [Dataset]. http://doi.org/10.11888/Paleoenv.tpdc.270093
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.11888/Paleoenv.tpdc.270093
Dataset updated
Nov 18, 2008
Dataset provided by
TPDC
Authors
Zhisheng AN
Area covered

Description
The project studying the evolution pattern and development trend of the arid environment in western China was a major research component of the project Environmental and Ecological Science for West China, which was funded by the National Natural Science Foundation of China. The leading executive of the project was Academician Zhisheng An from the Institute of Earth Environment of the Chinese Academy of Sciences. The project ran from January 2002 to December 2004. The data collected by the project include the following: 1. History and variability data for arid regions in western China: 1) Chinese Loess Plateau mass accumulation rate data (3600-0 kyr BP): Fields include age and mass accumulation rate (MAR) (txt file). 2) Chinese Loess Plateau grain size and magnetic susceptibility data (3600-0 kyr BP): Fields include age, stacked mean grain size, and stacked magnetic susceptibility (txt file). 2. Sporopollen content data of different loess strata since 12 kyr BP in the Yaozhou District of Shanxi Province (excel table): The distributions of 27 species of sporopollen (0-397 cm) from 67 different layers of loess samples are included. 3. 10Be record data (table) 10Be concentration, magnetic susceptibility and bulk density data of loess with different thicknesses (79.67- 0.09 kyr BP). 4. Simulation data on the modulation of the East Asian monsoon resulting from orbital variability driven by the uplift of the Tibetan Plateau: ah0-sum.nc nc file, hh0-sum.nc nc file, jfh0-sum.nc nc file, kdh0-sum.nc nc file, lfh0-sum.nc nc file, mask.nc nc file, phis.nc nc file.

Facebook

Twitter

Click to copy link

Link copied

Cite

Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10

ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.48436/3fcxr-cde10

Dataset updated

Jun 6, 2025

Dataset provided by

TU Wien

Authors

Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

Dataset paper (public preprint)

A description of this dataset, including the methodology and validation results, is available at:

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.

Abstract

ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

Summary

Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling
Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology
Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.
More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

Programmatic Download

You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

#!/bin/bash

# Set download directory
DOWNLOAD_DIR=~/Downloads

base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
  echo "Downloading $year.zip..."
  wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
  unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
  rm "$DOWNLOAD_DIR/$year.zip"
done

Data details

The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

Data Variables

Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).
sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.
sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)
sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also provided for cases where an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.
gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (1), and where the interpolated (smoothed) values are used instead (0) in the 'sm' field.
frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

Additional information for each variable is given in the netCDF attributes.

Version Changelog

Changes in v9.1r1 (previous version was v09.1):

This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

Software to open netCDF files

These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

https://github.com/pydata/xarray" target="_blank" rel="noopener">Xarray (python)
https://unidata.github.io/netcdf4-python/" target="_blank" rel="noopener">netCDF4 (python)
https://github.com/TUW-GEO/esa_cci_sm">esa_cci_sm (python)
Similar tools exists for other programming languages (Matlab, R, etc.)
Software packages and GIS tools can open netCDF files, e.g. CDO, NCO, QGIS, ArCGIS
You can also use the GUI software Panoply to view the contents of each file

References

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.
Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869
Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020
Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

Related Records

The following records are all part of the Soil Moisture Climate Data Records from satellites community

ESA CCI SM MODELFREE Surface Soil Moisture Record

<a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"

Clear search

Close search

Google apps

Main menu

ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...

Dataset paper (public preprint)

Abstract

Summary

Programmatic Download

Data details

Data Variables

Version Changelog

Software to open netCDF files

References

Related Records

National Weather Service Coded Surface Bulletins, 2003- (netCDF format)

NOAA Global Forecast System (GFS) netCDF Formatted Data

GRACE MONTHLY LAND WATER MASS GRIDS NETCDF RELEASE 5.0

TIGER/Line Shapefile, 2022, County, Robeson County, NC, Feature Names...

CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions

TIGER/Line Shapefile, 2022, County, Wake County, NC, Feature Names...

Outputs from a Regional Ocean Modeling System (ROMS) data assimilative...

NORTH CAROLINA

2023 Cartographic Boundary File (SHP), Place for North Carolina, 1:500,000

Datasets for Sentiment Analysis

BASIC Composite Ozone Time-Series Data

Catchment and river network netCDF files for river routing using mizuRoute:...

TIGER/Line Shapefile, 2023, County, Bertie County, NC, Address Ranges...

TIGER/Line Shapefile, 2023, County, Alamance County, NC, Address...

l477nc.m77t - MGD77 data file for Geophysical data from field activity...

Global temperature data from NCAR CSM - Dataset - NIASRA

1-km monthly precipitation dataset for China (1901-2023)

TIGER/Line Shapefile, 2023, County, Forsyth County, NC, Address Ranges...

Dataset on the evolution pattern and development trend of the arid...

ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observationsSee More Versions

Dataset paper (public preprint)

Abstract

Summary

Programmatic Download

Data details

Data Variables

Version Changelog

Software to open netCDF files

References

Related Records

ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations