IQR is proposed for the image-text retrieval task. We use 200,000 queries and the corresponding images as the annotated image-query pairs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics, mean ± SD, range, median and interquartile range (IQR).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Formula for converting median and interquartile range (IQR) into mean and standard deviation (SD).
The Precipitation Estimation from Remotely Sensed Information using an Artificial Neural Network-Climate Data Record (PERSIANN-CDR) is a satellite-based precipitation dataset for hydrological and climate studies, spanning from 1983 to present. It is the longest satellite-based precipitation record available, with daily data at 0.25° resolution for the 60°S–60°N latitude band.PERSIANN rain rate estimates are generated at 0.25° resolution and calibrated to a monthly merged in-situ and satellite product from the Global Precipitation Climatology Project (GPCP). The model uses Gridded Satellite (GridSat-B1) infrared data at 3-hourly time steps, with the raw output (PERSIANN-B1) bias-corrected and accumulated to produce the daily PERSIANN-CDR.The maps show 31 years (1984–2014) of annual and seasonal median and interquartile range (IQR) data. The median represents the 50th percentile of precipitation, and the IQR reflects the range between the 75th and 25th percentiles, showing data variability. Median and IQR are preferred over mean and standard deviation as they are less influenced by extreme values and better represent non-normally distributed data, such as precipitation, which is skewed and zero-limited.Data and Metadata: NCEIThis is a component of the Gulf Data Atlas (V1.0) for the Physical topic area.
This dataset is a collection of 1, 2, or 3 images from: BIPED, BSDS500, BSDS300, DIV2K, WIRE-FRAME, CID, CITYSCAPES, ADE20K, MDBD, NYUD, THANGKA, PASCAL-Context, SET14, URBAN10, and the camera-man image. The image selection process consists on computing the Inter-Quartile Range (IQR) intensity value on all the images, images larger than 720×720 pixels were not considered. In dataset whose images are in HR, they were cut. We thank all the datasets owners to make them public. This dataset is just for Edge Detection not contour nor Boundary tasks.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Geoscience Australias GEOMACS model was utilised to produce hindcast hourly time series of continental shelf (~20 to 300 m depth) bed shear stress (unit of measure: Pascal, Pa) on a 0.1 degree grid covering the period March 1997 to February 2008 (inclusive). The hindcast data represents the combined contribution to the bed shear stress by waves, tides, wind and density-driven circulation. Included in the parameters that will be calculated to represent the magnitude of the bulk of the data are the quartiles of the distribution; Q25, Q50 and Q75 (i.e. the values for which 25, 50 and 75 percent of the observations fall below). The interquartile range, , of the GEOMACS output takes the observations from between Q25 and Q75 to provide an accurate representation of the spread of observations. The interquartile range was shown to provide a more robust representation of the observations than the standard deviation, which produced highly skewed observations (Hughes and Harris 2008). This dataset is a contribution to the CERF Marine Biodiversity Hub and is hosted temporarily by CMAR on behalf of Geoscience Australia.
The Precipitation Estimation from Remotely Sensed Information using an Artificial Neural Network-Climate Data Record (PERSIANN-CDR) is a satellite-based precipitation dataset for hydrological and climate studies, spanning from 1983 to present. It is the longest satellite-based precipitation record available, with daily data at 0.25° resolution for the 60°S–60°N latitude band.PERSIANN rain rate estimates are generated at 0.25° resolution and calibrated to a monthly merged in-situ and satellite product from the Global Precipitation Climatology Project (GPCP). The model uses Gridded Satellite (GridSat-B1) infrared data at 3-hourly time steps, with the raw output (PERSIANN-B1) bias-corrected and accumulated to produce the daily PERSIANN-CDR.The maps show 31 years (1984–2014) of annual and seasonal median and interquartile range (IQR) data. The median represents the 50th percentile of precipitation, and the IQR reflects the range between the 75th and 25th percentiles, showing data variability. Median and IQR are preferred over mean and standard deviation as they are less influenced by extreme values and better represent non-normally distributed data, such as precipitation, which is skewed and zero-limited.Data and Metadata: NCEIThis is a component of the Gulf Data Atlas (V1.0) for the Physical topic area.
The Precipitation Estimation from Remotely Sensed Information using an Artificial Neural Network-Climate Data Record (PERSIANN-CDR) is a satellite-based precipitation dataset for hydrological and climate studies, spanning from 1983 to present. It is the longest satellite-based precipitation record available, with daily data at 0.25° resolution for the 60°S–60°N latitude band.PERSIANN rain rate estimates are generated at 0.25° resolution and calibrated to a monthly merged in-situ and satellite product from the Global Precipitation Climatology Project (GPCP). The model uses Gridded Satellite (GridSat-B1) infrared data at 3-hourly time steps, with the raw output (PERSIANN-B1) bias-corrected and accumulated to produce the daily PERSIANN-CDR.The maps show 31 years (1984–2014) of annual and seasonal median and interquartile range (IQR) data. The median represents the 50th percentile of precipitation, and the IQR reflects the range between the 75th and 25th percentiles, showing data variability. Median and IQR are preferred over mean and standard deviation as they are less influenced by extreme values and better represent non-normally distributed data, such as precipitation, which is skewed and zero-limited.Data and Metadata: NCEIThis is a component of the Gulf Data Atlas (V1.0) for the Physical topic area.
The Precipitation Estimation from Remotely Sensed Information using an Artificial Neural Network-Climate Data Record (PERSIANN-CDR) is a satellite-based precipitation dataset for hydrological and climate studies, spanning from 1983 to present. It is the longest satellite-based precipitation record available, with daily data at 0.25° resolution for the 60°S–60°N latitude band.PERSIANN rain rate estimates are generated at 0.25° resolution and calibrated to a monthly merged in-situ and satellite product from the Global Precipitation Climatology Project (GPCP). The model uses Gridded Satellite (GridSat-B1) infrared data at 3-hourly time steps, with the raw output (PERSIANN-B1) bias-corrected and accumulated to produce the daily PERSIANN-CDR.The maps show 31 years (1984–2014) of annual and seasonal median and interquartile range (IQR) data. The median represents the 50th percentile of precipitation, and the IQR reflects the range between the 75th and 25th percentiles, showing data variability. Median and IQR are preferred over mean and standard deviation as they are less influenced by extreme values and better represent non-normally distributed data, such as precipitation, which is skewed and zero-limited.Data and Metadata: NCEIThis is a component of the Gulf Data Atlas (V1.0) for the Physical topic area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*n = 1041 (35 missing data).BMI = body mass index (kg/m2); SD = standard deviation; IQR = interquartile range; EI energy intake (MJ/d); BMR = basal metabolic rate (MJ/d).
Originally constructed in 1995, the Global Oscillation Network Group (GONG) is a network of six identical ground-based solar telescopes distributed around the Earth in order to obtain continuous observations of the Sun. Those sites are located in Big Bear, California (BB); Mauna Loa, Hawaii (ML); Learmonth, Australia (LE); Udaipur, India (UD); El Teide, Spain (TD); and Cerro Tololo, Chile (CT). Additionally, there are three engineering/testbed sites in Boulder, Colorado (TC, TE, and TS). Owned by the National Science Foundation, GONG is operated and maintained by the National Solar Observatory (NSO) with significant funding from NOAA’s Space Weather Prediction Center (SWPC). Each minute, weather permitting, the GONG network observes the Sun at two spectral wavelengths: 676.78nm (a Ni I absorption line) and 656.28nm (the H-alpha absorption line).
The U.S. Climate Reference Network (USCRN) was designed to monitor the climate of the United States using research quality instrumentation located within representative pristine environments. This Standardized Soil Moisture (SSM) and Soil Moisture Climatology (SMC) product set is derived using the soil moisture observations from the USCRN. The hourly soil moisture anomaly (SMANOM) is derived by subtracting the MEDIAN from the soil moisture volumetric water content (SMVWC) and dividing the difference by the interquartile range (IQR = 75th percentile - 25th percentile) for that hour: SMANOM = (SMVWC - MEDIAN) / (IQR). The soil moisture percentile (SMPERC) is derived by taking all the values that were used to create the empirical cumulative distribution function (ECDF) that yielded the hourly MEDIAN and adding the current observation to the set, recalculating the ECDF, and determining the percentile value of the current observation. Finally, the soil temperature for the individual layers is provided for the dataset user convenience. The SMC files contain the MEAN, MEDIAN, IQR, and decimal fraction of available data that are valid for each hour of the year at 5, 10, 20, 50, and 100 cm depth soil layers as well as for a top soil layer (TOP) and column soil layer (COLUMN). The TOP layer consists of an average of the 5 and 10 cm depths, while the COLUMN layer includes all available depths at a location, either two layers or five layers depending on soil depth. The SSM files contain the mean VWC, SMANOM, SMPERC, and TEMPERATURE for each of the depth layers described above. File names are structured as CRNSSM0101-STATIONNAME.csv and CRNSMC0101-STATIONNAME.csv. SSM stands for Standardized Soil Moisture and SCM represent Soil Moisture Climatology. The first two digits of the trailing integer indicate major version and the second two digits minor version of the product.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.
If you find the data helpful or you use the data for your research, please cite our work:
Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).
The data record is structured into 4 comma-separated value (CSV) files, as follows:
id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:
COD_PROV is an integer field that is used to identify a province in all other data records;
SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);
DEN_PCM is the full name of the province.
OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:
p1: COD_PROV of origin,
p2: COD_PROV of destination,
day: in the format yyyy-mm-dd.
median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.
week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,
week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,
average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
day in the format yyyy-mm-dd.
ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Dataset from the article Secchi F, Monti CB, Alì M, Carbone FS, Cannaò PM, Sardanelli F. Diagnostic Value of Global Cardiac Strain in Patients With Myocarditis. J Comput Assist Tomogr. 2020 Jul/Aug;44(4):591-598. doi: 10.1097/RCT.0000000000001062. PMID: 32697530.
Abstract
Background: Cardiac strain represents an imaging biomarker of contractile dysfunction.
Purpose: The purpose of this study was to investigate the diagnostic value of cardiac strain obtained by feature-tracking cardiac magnetic resonance (MR) in acute myocarditis.
Materials and methods: Cardiac MR examinations of 46 patients with myocarditis and preserved ejection fraction at acute phase and follow-up were analyzed along with cardiac MR of 46 healthy age- and sex-matched controls. Global circumferential strain and global radial strain were calculated for each examination, along with myocardial edema and late gadolinium enhancement, and left ventricle functional parameters, through manual contouring of the myocardium. Correlations were assessed using Spearman ρ. Wilcoxon and Mann-Whitney U test were used to assess differences between data. Receiver operating characteristics curves and reproducibility were obtained to assess the diagnostic role of strain parameters.
Results: Global circumferential strain was significantly lower in controls (median, -20.4%; interquartile range [IQR], -23.4% to -18.7%) than patients in acute phase (-18.4%; IQR, -21.0% to -16.1%; P = 0.001) or at follow-up (-19.2%; IQR, -21.5% to -16.1%; P = 0.020). Global radial strain was significantly higher in controls (82.4%; IQR, 62.8%-104.9%) than in patients during the acute phase (65.8%; IQR, 52.9%-79.5%; P = 0.001). Correlations were found between global circumferential strain and global radial strain in all groups (acute, ρ = -0.580, P < 0.001; follow-up, ρ = -0.399, P = 0.006; controls, ρ = -0.609, P < 0.001), and between global circumferential strain and late gadolinium enhancement only in myocarditis patients (acute, ρ = 0.035, P = 0.024; follow-up, ρ = 0.307, P = 0.038).
Conclusions: Cardiac strain could potentially have a role in detecting acute myocarditis in low-risk acute myocarditis patients where cardiac MR is the main diagnosing technique.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
this dataset gathered the trajectories of 161 lagrangian surface drifters that were deployed in the western mediterranean sea in 2023 by three campaigns of the swot adopt-a-crossover consortium: c-swot-2023, bioswot-med and fast-swot. drifter trajectories are available between march 27th 2023 and january 22th 2024. the deployment strategy involved releasing drifters to target specific mesoscale and submesoscale structures in the vicinity of selected swot passes. these structures were identified using spasso software, which combined near-real-time remote data from copernicus (duacs) and early swot data provided by cls/cnes. several drifter designs are used in these experiments : svp drifters drogued at 15m, 50m, and 100m; svp-b drifters at 15m depth; a customized bgc-svp drifter drogued at 15m and equipped with additional sensors such as a ctd (for temperature and salinity) and an optical triplet measuring biochemical properties of sea; surface drifters such as code, carthe, hereon type with drogue within the first meter depth; and spotter, melodi-eodyn devices as wave drifters. the original nominal sampling rates range from 5 minutes to 1 hour. drifters were deployed in the passes 3 and 16 of swot orbit during its fast-sampling (cal-val) phase (1-day revisit until july 10th) and some of the drifters further crossed the satellite ground-tracks afterwords, when the satellite science orbit was set to 21 days. this dataset is a collaborative effort between the swot-adac consortium and fast-swot, bioswot-med and c-swot cruises. to provide a single interoperable dataset, all drifter trajectories from the different campaigns were processed with the same scripts in a similar manner, resulting in three distinct levels of processing. l0 – harmonised and preprocessed trajectoriesall initial trajectories are merged into a single dataset with variables renamed to match database standards. the following steps are applied: removing rows with missing date/time, ordering by ascending time, trimming to valid deployment/recovery periods, dropping rows with missing values, eliminating duplicates, removing rows with repeated times but different positions, and excluding rows with erroneous latitude/longitude (e.g., outliers outside the mediterranean sea). l1 – processed trajectoriesl1 trajectories are filtered based on acceleration. velocity and acceleration are calculated at each timestep, and positions with accelerations exceeding 4 times the interquartile range (iqr) are removed. this results in irregularly spaced trajectories that retain the original gps positions and therefore the overall current dynamics signal with its multiscale components but exclude gps fix outliers as defined above.l2 – smoothed and regularly interpolated trajectoriesl2-trajectories are obtained from the l1-trajectories, that are regularly interpolated and smoothed in order to reduce noise, especially on acceleration. two methods are used: the lowess method (inspired by elipot et al. 2016) and a variational method developed by m. demol and a. ponte (inspired by yaremchuk and coelho, 2014). l2 trajectories are available with time steps of 10 minutes, 30 minutes, or 1 hour. for more details on the smoothing and interpolating processing, please refer to the attached pdf.data export in netcdf formateach drifter trajectory is stored in eight separate netcdf files, organised into eight distinct folders based on the processing stage and temporal resolution. for a given drifter, the following files are available :l0_data/bioswot_carthe_4388553.ncl1_data/bioswot_carthe_4388553.ncl2_data_variational_10min/bioswot_carthe_4388553.ncl2_data_variational_30min/bioswot_carthe_4388553.ncl2_data_variational_1hour/bioswot_carthe_4388553.ncl2_data_lowess_10min/bioswot_carthe_4388553.ncl2_data_lowess_30min/bioswot_carthe_4388553.ncl2_data_lowess_1hour/bioswot_carthe_4388553.nccontact list : maristella berta (maristella.berta@sp.ismar.cnr.it), margot demol (margot.demol@ifremer.fr), laura gómez navarro (laura.gomez@uib.es) and lloyd izard (lloyd.izard@locean.ipsl.fr)pis contact for the different involved projects: bio-swot-med andrea doglioli (andrea.doglioli@univ-amu.fr); c-swot pierre garreau (pierre.garreau@ifremer.fr), franck dumas (franck.dumas@shom.fr) and aurélien ponte (aurelien.ponte@ifremer.fr); fast-swot: ananda pascual (ananda.pascual@imedea.uib-csic.es) and baptiste mourre (bmourre@imedea.uib-csic.es).referencesdavis, russ e. “drifter observations of coastal surface currents during code: the method and descriptive view.” journal of geophysical research: oceans 90, no. c3 (1985): 4741–55. https://doi.org/10.1029/jc090ic03p04741.elipot, shane, rick lumpkin, renellys c perez, jonathan m lilly, jeffrey j early, and adam m sykulski. “a global surface drifter data set at hourly resolution.” journal of geophysical research: oceans 121, no. 5 (2016):[...]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Those loading most heavily (component load ≥|0.5|) in principal component analyses are identified in bold.
Background: Severe pneumonia is pathological manifestation of Coronavirus Disease 2019 (COVID-19), however complications have been reported in COVID-19 patients with a worst prognosis. Aim of this study was to evaluate the role of high sensitivity cardiac troponin I (hs-TnI) in patients with SARS-CoV-2 infection.
Methods: we retrospectively analysed hs-TnI values measured in 523 patients (median age 64 years, 68% men) admitted to a university hospital in Milan, Italy, and diagnosed COVID-19.
Results: A significant difference in hs-TnI concentrations was found between deceased patients (98 patients) vs discharged (425 patients) [36.05 ng/L IQR 16.5-94.9 vs 6.3 ng/L IQR 2.6-13.9, p < 0.001 respectively]. Hs-TnI measurements were independent predictors of mortality at multivariate analysis adjusted for confounding parameters such as age (HR 1.004 for each 10 point of troponin, 95% CI 1.002-1.006, p < 0.001). The survival rate, after one week, in patients with hs-TnI values under 6 ng/L was 97.94%, between 6 ng/L and the normal value was 90.87%, between the normal value and 40 ng/L was 86.98, and 59.27% over 40 ng/L.
Conclusion: Increase of hs-TnI associated with elevated mortality in patients with COVID-19. Troponin shows to be a useful biomarker of disease progression and worse prognosis in COVID-19 patients.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objectives: This study aimed to ascertain utility and vision-related quality of life in patients awaiting access to specialist eye care. A secondary aim was to evaluate the association of utility indices with demographic profile and waiting time. Methods: Consecutive patients that had been waiting for ophthalmology care answered the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25). The questionnaire was administered when patients arrived at the clinics for their first visit. We derived a utility index (VFQ-UI) from the patients’ responses, then calculated the correlation between this index and waiting time and compared utility across demographic subgroups stratified by age, sex, and care setting. Results: 536 individuals participated in the study (mean age 52.9±16.6 years; 370 women, 69% women). The median utility index was 0.85 (interquartile range [IQR] 0.70–0.92; minimum 0.40, maximum 0.97). The mean VFQ-25 score was 70.88±14.59. Utility correlated weakly and nonsignificantly with waiting time (-0.05, P = 0.24). It did not vary across age groups (P = 0.85) or care settings (P = 0.77). Utility was significantly lower for women (0.84, IQR 0.70–0.92) than men (0.87, IQR 0.73–0.93, P = 0.03), but the magnitude of this difference was small (Cohen’s d = 0.13). Conclusion: Patients awaiting access to ophthalmology care had a utility index of 0.85 on a scale of 0 to 1. This measurement was not previously reported in the literature. Utility measures can provide insight into patients’ perspectives and support economic health analyses and inform health policies.
The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Hydrological Response Variables (HRVs) are the hydrological characteristics of the system that potentially change due to coal resource development. These data refer to the HRVs related to the AWRA-R model for the Namoi subregion for the 54 simulation nodes. The nine hydrological response variables (AF, P99, FD, IQR, ZFD, P01, LFD, LFS, LLFS) were computed under CRDP and Baseline conditions, respectively and the ACRD is the difference between the Baseline and CRDP.
Abbreviation meaning
AF - the annual streamflow volume (GL/year)
P01 - the daily streamflow rate at the first percentile (ML/day)
P01 - the daily streamflow rate at the first percentile (ML/day)
IQR - the inter-quartile range in daily streamflow (ML/day). That is, the difference between the daily streamflow rate at the 75th percentile and at the 25th percentile.
LFD - the number of low streamflow days per year. The threshold for low streamflow days is the 10th percentile from the simulated 90-year period (2013 to 2102)
LFS - the number of low streamflow spells per year (perennial streams only). A spell is defined as a period of contiguous days of streamflow below the 10th percentile threshold
LLFS - the length (days) of the longest low streamflow spell each year
P99 - the daily streamflow rate at the 99th percentile (ML/day)
FD - flood days, the number of days with streamflow greater than the 90th percentile from the simulated 90-year period (2013 to 2102)
ZFD - Zero flow days
This is the dataset used for the Namoi 2.6.1 product to evaluate additional coal mine and coal resource development impacts on hydrological response variables at 54 simulation nodes.
The Namoi AWRA-R model outputs were used to determine the impacts on the HRVs to produce these data. Readme files within the folders in the dataset provide an explanation on how the resource was created. The nine HRVs (AF, P99, FD, IQR, ZFD, P01, LFD, LFS, LLFS) were computed under CRDP and Baseline conditions, respectively. The difference between CRDP and Baseline is used for predicting ACRD impacts on hydrological response variables at 54 simulation nodes.
Bioregional Assessment Programme (2017) Namoi standard Hydrological Response Variables (HRVs). Bioregional Assessment Derived Dataset. Viewed 11 December 2018, http://data.bioregionalassessments.gov.au/dataset/189f4c7a-29e1-41f9-868d-b7f5184d829f.
Derived From Historical Mining Footprints DTIRIS NAM 20150914
Derived From Namoi AWRA-R (restricted input data implementation)
Derived From River Styles Spatial Layer for New South Wales
Derived From Namoi Surface Water Mine Footprints - digitised
Derived From Namoi AWRA-R model implementation (post groundwater input)
Derived From National Surface Water sites Hydstra
Derived From Namoi AWRA-L model
Derived From Namoi Hydstra surface water time series v1 extracted 140814
Derived From GEODATA 9 second DEM and D8: Digital Elevation Model Version 3 and Flow Direction Grid 2008
Derived From Namoi Environmental Impact Statements - Mine footprints
Derived From Namoi Existing Mine Development Surface Water Footprints
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1. Introduction
Datasets are used to evaluate the performance of a Kalman filter approach to estimate daily discharge. This is a perturbed version of synthetic SWOT datasets consisting of 15 river sections, which are commonly agreed datasets for evaluating the performance of SWOT discharge algorithms (Frasson et al., 2020, 2021). The benchmarking manuscript entitled “A Kalman Filter Approach for Estimating Daily Discharge Using Space-based Discharge Estimates” is currently under review at Water Resources Research. Once the manuscript is accepted, its DOI will be included here.
2. File description
The datasets are generally divided into two categories: river information (River_Info) and time series data (Timeseries_Data). River information provides fundamental and general river characteristics, whereas time series data offers daily reach-averaged data for each reach. In time series data, the data mainly contains three components: true data, perturbed measurements, and true and perturbed flow law parameters (A0, an, and b). For each reach, there are 10000 realizations of perturbed measurements per time step and there are 100 realizations of time-invariant perturbed flow law parameters through a Monte Carlo simulation (Frasson et al., 2023). Moreover, to support our proposed Kalman filter approach to estimate daily discharge, the datasets provide the median of the perturbed discharge, river width, water surface slope, and change in the cross-sectional area, as well as the uncertainty of the perturbed discharge and change in the cross-sectional area based on the interquartile range (Fox, 2015).
Datasets are contained in a .mat file per river. The detailed groups and variables are in the following:
River_Info
Name: River name, data type: char
QWBM: Mean annual discharge from the water balance model WBMsed (Cohen et al., 2014)
rch_bnd: Reach boundaries measured in meters from the upstream end of the model
gdrch: Good reaches in the study. They were used to exclude small reaches defined around low-head dams and other obstacles where Manning’s equation should not be applied.
Timeseries_Data
t: Time measured in days since the first day or “0-January-0000” for cases when specific dates were available. Dimension: 1, time step.
A: Reach-averaged cross-sectional area of flow in m2. Dimension: Reach, time step.
Q_true: True reach-averaged discharge (m3/s). Dimension: Reach, time step.
Q_ptb: Perturbed discharge (m3/s), including 10000 realizations for each measurement. Dimension: Good reach, time step, 10000.
med_Q_ptb: Median perturbed discharge (m3/s) across the 10000 realizations. Dimension: Good reach, time step.
sigma_Q_ptb: Uncertainty of the perturbed discharge (m3/s), calculated based on the interquartile range. Dimension: Good reach, time step.
W_true: True reach-averaged river width (m). Dimension: Reach, time step.
W_ptb: Perturbed river width (m), including 10000 realizations for each measurement. Dimension: Good reach, time step, 10000.
med_W_ptb: Median perturbed river width (m) across the 10000 realizations. Dimension: Good reach, time step.
H_true: True reach-averaged water surface elevation (m). Dimension: Reach, time step.
H_ptb: Perturbed water surface elevation (m), including 10000 realizations for each measurement. Dimension: Good reach, time step, 10000.
S_true: True reach-averaged water surface slope (m/m). Dimension: Reach, time step.
S_ptb: Perturbed water surface slope (m/m), including 10000 realizations for each measurement. Dimension: Good reach, time step, 10000.
med_S_ptb: Median perturbed water surface slope (m/m) across the 10000 realizations. Dimension: Good reach, time step.
dA_true: True reach-averaged change in the cross-sectional area (m2). Dimension: Good reach, time step.
dA_ptb: Perturbed change in the cross-sectional area (m2), including 10000 realizations for each measurement. Dimension: Good reach, time step, 10000.
med_dA_ptb: Median perturbed change in the cross-sectional area (m2) across the 10000 realizations. Dimension: Good reach, time step.
sigma_dA_ptb: Uncertainty of the perturbed change in the cross-sectional area (m2), calculated based on the interquartile range. Dimension: Good reach, time step.
A0_true: True baseline cross-sectional area (m2). Dimension: Good reach, 1.
A0: Perturbed baseline cross-sectional area (m2), including 100 realizations for each parameter. Dimension: Good reach, 100.
na_true: True friction coefficient. Dimension: Good reach, 1.
na: Perturbed friction coefficient, including 100 realizations for each parameter. Dimension: Good reach, 100.
b_true: True exponent coefficient. Dimension: Good reach, 1.
b: Perturbed exponent coefficient, including 100 realizations for each parameter. Dimension: Good reach, 100.
IQR is proposed for the image-text retrieval task. We use 200,000 queries and the corresponding images as the annotated image-query pairs.