16 datasets found

Handling of Missing Data Induced by Time-Varying Covariates in Comparative...
icpsr.umich.edu
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Desai, Manisha (2025). Handling of Missing Data Induced by Time-Varying Covariates in Comparative Effectiveness Research HIV Patients [Methods Study], 2013-2018 [Dataset]. http://doi.org/10.3886/ICPSR39528.v1
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR39528.v1
Dataset updated
Oct 9, 2025
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Desai, Manisha
License
https://www.icpsr.umich.edu/web/ICPSR/studies/39528/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39528/terms
Time period covered
2013 - 2018
Description
Researchers can use data from health registries or electronic health records to compare two or more treatments. Registries store data about patients with a specific health problem. These data include how well those patients respond to treatments and information about patient traits, such as age, weight, or blood pressure. But sometimes data about patient traits are missing. Missing data about patient traits can lead to incorrect study results, especially when traits change over time. For example, weight can change over time, and the patient may not report their weight at some points along the way. Researchers use statistical methods to fill in these missing data. In this study, the research team compared a new statistical method to fill in missing data with traditional methods. Traditional methods remove patients with missing data or fill in each missing number with a single estimate. The new method creates multiple possible estimates to fill in each missing number. To access the methods, software, and R package, please visit the SimulateCER GitHub and SimTimeVar CRAN website.
Example of how to manually extract incubation bouts from interactive plots...
figshare.com
txt
Updated Jan 22, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Bulla (2016). Example of how to manually extract incubation bouts from interactive plots of raw data - R-CODE and DATA [Dataset]. http://doi.org/10.6084/m9.figshare.2066784.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2066784.v1
Dataset updated
Jan 22, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Martin Bulla
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
{# General information# The script runs with R (Version 3.1.1; 2014-07-10) and packages plyr (Version 1.8.1), XLConnect (Version 0.2-9), utilsMPIO (Version 0.0.25), sp (Version 1.0-15), rgdal (Version 0.8-16), tools (Version 3.1.1) and lattice (Version 0.20-29)# --------------------------------------------------------------------------------------------------------# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# -------------------------------------------------------------------------------------------------------- # Data collection and how the individual variables were derived is described in: #Steiger, S.S., et al., When the sun never sets: diverse activity rhythms under continuous daylight in free-living arctic-breeding birds. Proceedings of the Royal Society B: Biological Sciences, 2013. 280(1764): p. 20131016-20131016. # Dale, J., et al., The effects of life history and sexual selection on male and female plumage colouration. Nature, 2015. # Data are available as Rdata file # Missing values are NA. # --------------------------------------------------------------------------------------------------------# For better readability the subsections of the script can be collapsed # --------------------------------------------------------------------------------------------------------}{# Description of the method # 1 - data are visualized in an interactive actogram with time of day on x-axis and one panel for each day of data # 2 - red rectangle indicates the active field, clicking with the mouse in that field on the depicted light signal generates a data point that is automatically (via custom made function) saved in the csv file. For this data extraction I recommend, to click always on the bottom line of the red rectangle, as there is always data available due to a dummy variable ("lin") that creates continuous data at the bottom of the active panel. The data are captured only if greenish vertical bar appears and if new line of data appears in R console). # 3 - to extract incubation bouts, first click in the new plot has to be start of incubation, then next click depict end of incubation and the click on the same stop start of the incubation for the other sex. If the end and start of incubation are at different times, the data will be still extracted, but the sex, logger and bird_ID will be wrong. These need to be changed manually in the csv file. Similarly, the first bout for a given plot will be always assigned to male (if no data are present in the csv file) or based on previous data. Hence, whenever a data from a new plot are extracted, at a first mouse click it is worth checking whether the sex, logger and bird_ID information is correct and if not adjust it manually. # 4 - if all information from one day (panel) is extracted, right-click on the plot and choose "stop". This will activate the following day (panel) for extraction. # 5 - If you wish to end extraction before going through all the rectangles, just press "escape". }{# Annotations of data-files from turnstone_2009_Barrow_nest-t401_transmitter.RData dfr-- contains raw data on signal strength from radio tag attached to the rump of female and male, and information about when the birds where captured and incubation stage of the nest1. who: identifies whether the recording refers to female, male, capture or start of hatching2. datetime_: date and time of each recording3. logger: unique identity of the radio tag 4. signal_: signal strength of the radio tag5. sex: sex of the bird (f = female, m = male)6. nest: unique identity of the nest7. day: datetime_ variable truncated to year-month-day format8. time: time of day in hours9. datetime_utc: date and time of each recording, but in UTC time10. cols: colors assigned to "who"--------------------------------------------------------------------------------------------------------m-- contains metadata for a given nest1. sp: identifies species (RUTU = Ruddy turnstone)2. nest: unique identity of the nest3. year_: year of observation4. IDfemale: unique identity of the female5. IDmale: unique identity of the male6. lat: latitude coordinate of the nest7. lon: longitude coordinate of the nest8. hatch_start: date and time when the hatching of the eggs started 9. scinam: scientific name of the species10. breeding_site: unique identity of the breeding site (barr = Barrow, Alaska)11. logger: type of device used to record incubation (IT - radio tag)12. sampling: mean incubation sampling interval in seconds--------------------------------------------------------------------------------------------------------s-- contains metadata for the incubating parents1. year_: year of capture2. species: identifies species (RUTU = Ruddy turnstone)3. author: identifies the author who measured the bird4. nest: unique identity of the nest5. caught_date_time: date and time when the bird was captured6. recapture: was the bird capture before? (0 - no, 1 - yes)7. sex: sex of the bird (f = female, m = male)8. bird_ID: unique identity of the bird9. logger: unique identity of the radio tag --------------------------------------------------------------------------------------------------------}
Z
Dataset from: High consistency and repeatability in the breeding migrations...
data.niaid.nih.gov
zenodo.org
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2024). Dataset from: High consistency and repeatability in the breeding migrations of a benthic shark [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11467088
Explore at:
Dataset updated
Jun 4, 2024
Authors
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset and scripts used for manuscript: High consistency and repeatability in the breeding migrations of a benthic shark.

Project title: High consistency and repeatability in the breeding migrations of a benthic sharkDate:23/04/2024

Folders:- 1_Raw_data - Perpendicular_Point_068151, Sanctuary_Point_068088, SST raw data, sst_nc_files, IMOS_animal_measurements, IMOS_detections, PS&Syd&JB tags, rainfall_raw, sample_size, Point_Perpendicular_2013_2019, Sanctuary_Point_2013_2019, EAC_transport- 2_Processed_data - SST (anomaly, historic_sst, mean_sst_31_years, week_1992_sst:week_2022_sst including week_2019_complete_sst) - Rain (weekly_rain, weekly_rainfall_completed) - Clean (clean, cleaned_data, cleaned_gam, cleaned_pj_data)- 3_Script_processing_data - Plots(dual_axis_plot (Fig. 1 & Fig. 4).R, period_plot (Fig. 2).R, sd_plot (Fig. 5).R, sex_plot (Fig. 3).R - cleaned_data.R, cleaned_data_gam.R, weekly_rainfall_completed.R, descriptive_stats.R, sst.R, sst_2019b.R, sst_anomaly.R- 4_Script_analyses - gam.R, gam_eac.R, glm.R, lme.R, Repeatability.R- 5_Output_doc - Plots (arrival_dual_plot_with_anomaly (Fig. 1).png, period_plot (Fig.2).png, sex_arrival_departure (Fig. 3).png, departure_dual_plot_with_anomaly (Fig. 4).png, standard deviation plot (Fig. 5).png) - Tables (gam_arrival_eac_selection_table.csv (Table S2), gam_departure_eac_selection_table (Table S5), gam_arrival_selection_table (Table. S3), gam_departure_selection_table (Table. S6), glm_arrival_selection_table, glm_departure_selection_table, lme_arrival_anova_table, lme_arrival_selection_table (Table S4), lme_departure_anova_table, lme_departure_selection_table (Table. S8))

Descriptions of scripts and files used:- cleaned_data.R: script to extract detections of sharks at Jervis Bay. Calculate arrival and departure dates over the seven breeding seasons. Add sex and length for each individual. Extract moon phase (numerical value) and period of the day from arrival and departure times. - IMOS_detections.csv: raw data file with detections of Port Jackson sharks over different sites in Australia. - IMOS_animal_measurements.csv: raw data file with morphological data of Port Jackson sharks - PS&Syd&JB tags: file with measurements and sex identification of sharks (different from IMOS, it was used to complete missing sex and length). - cleaned_data.csv: file with arrival and departure dates of the final sample size of sharks (N=49) with missing sex and length for some individuals. - clean.csv: completed file using PS&Syd&JB tags, note: tag ID 117393679 was wrongly identified as a male in IMOS and correctly identified as a female in PS&Syd&JB tags file as indicated by its large size. - cleaned_pj_data: Final data file with arrival and departure dates, sex, length, moon phase (numerical) and period of the day.

weekly_rainfall_completed.R: script to calculate average weekly rainfall and correlation between the two weather stations used (Point perpendicular and Sanctuary point). - weekly_rain.csv: file with the corresponding week number (1-28) for each date (01-06-2013 to 13-12-2019) - weekly_rainfall_completed.csv: file with week number (1-28), year (2013-2019) and weekly rainfall average completed with Sanctuary Point for week 2 of 2017 - Point_Perpendicular_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Point Perpendicular weather station - Sanctuary_Point_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Sanctuary Point weather station - IDCJAC0009_068088_2017_Data.csv: Rainfall (mm) from 01-01-2017 to 31-12-2017 at the Sanctuary Point weather station (to fill in missing value for average rainfall of week 2 of 2017)

cleaned_data_gam.R: script to calculate weekly counts of sharks to run gam models and add weekly averages of rainfall and sst anomaly - cleaned_pj_data.csv - anomaly.csv: weekly (1-28) average sst anomalies for Jervis Bay (2013-2019) - weekly_rainfall_completed.csv: weekly (1-28) average rainfall for Jervis Bay (2013-2019_ - sample_size.csv: file with the number of sharks tagged (13-49) for each year (2013-2019)

sst.R: script to extract daily and weekly sst from IMOS nc files from 01-05 until 31-12 for the following years: 1992:2022 for Jervis Bay - sst_raw_data: folder with all the raw weekly (1:28) csv files for each year (1992:2022) to fill in with sst data using the sst script - sst_nc_files: folder with all the nc files downloaded from IMOS from the last 31 years (1992-2022) at the sensor (IMOS - SRS - SST - L3S-Single Sensor - 1 day - night time – Australia). - SST: folder with the average weekly (1-28) sst data extracted from the nc files using the sst script for each of the 31 years (to calculate temperature anomaly).

sst_2019b.R: script to extract daily and weekly sst from IMOS nc file for 2019 (missing value for week 19) for Jervis Bay - week_2019_sst: weekly average sst 2019 with a missing value for week 19 - week_2019b_sst: sst data from 2019 with another sensor (IMOS – SRS – MODIS - 01 day - Ocean Colour-SST) to fill in the gap of week 19 - week_2019_complete_sst: completed average weekly sst data from the year 2019 for weeks 1-28.

sst_anomaly.R: script to calculate mean weekly sst anomaly for the study period (2013-2019) using mean historic weekly sst (1992-2022) - historic_sst.csv: mean weekly (1-28) and yearly (1992-2022) sst for Jervis Bay - mean_sst_31_years.csv: mean weekly (1-28) sst across all years (1992-2022) for Jervis Bay - anomaly.csv: mean weekly and yearly sst anomalies for the study period (2013-2019)

Descriptive_stats.R: script to calculate minimum and maximum length of sharks, mean Julian arrival and departure dates per individual per year, mean Julian arrival and departure dates per year for all sharks (Table. S10), summary of standard deviation of julian arrival dates (Table. S9) - cleaned_pj_data.csv

gam.R: script used to run the Generalized additive model for rainfall and sea surface temperature - cleaned_gam.csv

glm.R: script used to run the Generalized linear mixed models for the period of the day and moon phase - cleaned_pj_data.csv - sample_size.csv

lme.R: script used to run the Linear mixed model for sex and size - cleaned_pj_data.csv

Repeatability.R: script used to run the Repeatability for Julian arrival and Julian departure dates - cleaned_pj_data.csv
d
R script to calculate daily PAR from solar radiation data
search.dataone.org
hydroshare.org
+1more
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandra Villamizar (2021). R script to calculate daily PAR from solar radiation data [Dataset]. https://search.dataone.org/view/sha256%3Aa20a7df551f7173e49435a830aa43e394285b4bac8dfedcd1b7d4428bbacd3bf
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Sandra Villamizar
Description
Generating estimates of daily reference photosynthetically Active Radiation (PAR). We show the procedure to generate estimates of daily reference PAR using solar radiation data. The input for the R script (CalculateDailyPAR.R) is a raw time series of hourly solar radiation (stored in variable ‘ws’) that for our case was obtained from the CIMIS website (station id: 105) [California Department of Water Resources, 2015]. The script processes the data set to format the date and time columns, and to identify missing data points reporting their position within the time series (variable ‘na.id’). The user fills the gaps using adequate strategies and creates a new input file (stored in variable ‘fill.points’) containing the values to fill in within the time series. A reference PAR estimate is obtained as a constant fraction of solar radiation using the conversion factor proposed by [Meek et al., 1984]. The script then calculates an average daily value of solar radiation and integrates the reference PAR over the daytime period to obtain a daily value. The script ends by generating a final table (‘ws.results’) reporting daily values of solar radiation (maximum and mean in W m-2), and maximum, mean, and minimum reference PAR values in units of (μmol m-2 d-1) and (mol m-2 d-1). DOI:10.6084/m9.figshare.3412765
t
ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...
researchdata.tuwien.ac.at
researchdata.tuwien.at
zip
Updated Sep 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48436/3fcxr-cde10
Dataset updated
Sep 5, 2025
Dataset provided by
TU Wien
Authors
Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

Dataset Paper (Open Access)

A description of this dataset, including the methodology and validation results, is available at:

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

Abstract

ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

Summary

Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling

Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology

Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.

More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

Programmatic Download

You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

#!/bin/bash

# Set download directory
DOWNLOAD_DIR=~/Downloads

base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done

Data details

The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

Data Variables

Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).

sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.

sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)

sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also provided for cases where an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.

gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (1), and where the interpolated (smoothed) values are used instead (0) in the 'sm' field.

frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

Additional information for each variable is given in the netCDF attributes.

Version Changelog

Changes in v9.1r1 (previous version was v09.1):

This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

Software to open netCDF files

These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

https://github.com/pydata/xarray" target="_blank" rel="noopener">Xarray (python)

https://unidata.github.io/netcdf4-python/" target="_blank" rel="noopener">netCDF4 (python)

https://github.com/TUW-GEO/esa_cci_sm">esa_cci_sm (python)

Similar tools exists for other programming languages (Matlab, R, etc.)

Software packages and GIS tools can open netCDF files, e.g. CDO, NCO, QGIS, ArCGIS

You can also use the GUI software Panoply to view the contents of each file

References

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869

Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020

Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

Related Records

The following records are all part of the ESA CCI Soil Moisture science data records community

1
ESA CCI SM MODELFREE Surface Soil Moisture Record
<a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"
Missing the (tipping) point: the effect of information about climate tipping...
zenodo.org
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Nadeau; Christina Nadeau (2024). Missing the (tipping) point: the effect of information about climate tipping points on public risk perceptions in Norway [Dataset] [Dataset]. http://doi.org/10.5281/zenodo.13133016
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13133016
Dataset updated
Jul 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christina Nadeau; Christina Nadeau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 4, 2023
Area covered
Norway
Description
This is all the data used for the redaction of the research paper "Missing the (tipping) point: the effect of information about climate tipping points on public risk perceptions in Norway".

The dataset is contained in Excel files (.xlsx) and code for statistical analysis can be found in R files (.R)
Z
Hydro-meteorological database for watersheds across the CIS
data.niaid.nih.gov
zenodo.org
Updated Oct 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abramov Dmitrii; Kurochkina Lyubov (2023). Hydro-meteorological database for watersheds across the CIS [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7789303
Explore at:
Dataset updated
Oct 12, 2023
Dataset provided by
State Hydrological Institute, Skoltech
State Hydrological Institute
Authors
Abramov Dmitrii; Kurochkina Lyubov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The presented database is a set of hydrological, meteorological, environmental and geometric values for Russia Federation for the period from 2008 to 2020.

Database consist of next items:

Point geometry for hydrological observation stations from Roshydromet network across Russia

Geometry of the catchment for correspond observation station point

Daily hydrological values

Water level

In relative representation (sm)

In meters of Baltic system (m)

Water discharge

as an observed value (qms/s)

as a layer (mm/day)

Daily meteorological values

Maximum and minimum daily temperatures (°C) from ERA5 and ERA5-Land

Total precipitation (mm/day) from ERA5, ERA5-Land, IMERG v06, GPCP v3.2 and MSWEP

Different kind of evaporation (mm/day) corresponded to each variable calculated in GLEAM model

Set of hydro-environmental characteristics derived from HydroATLAS database

Each variable derived from the grid data was calculated for each watershed, taking into account the intersection weights of the watershed contour geometry and grid cells.

Coordinates of hydrological stations were obtained from resource of Federal Agency for Water Resources of Russia Federation—AIS GMVO

To calculate the contours of the catchment areas, a script was developed that builds the contours in accordance with the rasters of flow directions from MERIT Hydro. To assess the quality of the contour construction, the obtained value of the catchment area was compared with the archival value from the corresponded table from AIS GMVO. The average error in determining the area for 2080 catchments is approximately 2%

To derive values for different hydro-environmental values from HydroATLAS were developed approach which calculate aggregated values for catchment, leaning on type of variable: qualitative (Land cover classes, Lithological classes etc.) Or quantitive (Air temperature, Snow cover extent etc.). Every quantitive variable were calculated as mode value for intersected sub-basins and target catchment, e.g. most popular attribute from sub-basins will describe whole catchment which are they relating. Quantitative values were calculated as mean value of attribute from each sub-basin. More detail could be found in publication.

Files are distributed as follows:

Each file has some connection with the unique identifier of the hydrological observation post. Files in netcdf format (hydrological and meteorological series) are named in response to identifier.

Every file which describe geometry (point, polygon, static attributes) has and column named gauge_id with same correspondence.

attributes/static_data.csv – results from HydroATLAS aggregation

geometry/russia_gauges.gpkg – coordinates of hydrological observation stations

gauge_id name_ru name_en geometry 0 49001 р. Ковда – пос. Софпорог r.Kovda - pos. Sofporog POINT (31.41892 65.79876) 1 49014 р. Корпи-Йоки – пос. Пяозерский r.Korpi-Joki - pos. Pjaozerskij POINT (31.05794 65.77917) 2 49017 р. Тумча – пос. Алакуртти r.Tumcha - pos. Alakurtti POINT (30.33082 66.95957)

geometry/russia_ws.gpkg – catchments polygon for each hydrological observation stations

gauge_id name_ru name_en new_area ais_dif geometry 0 9002 р. Енисей – г. Кызыл r.Enisej - g.Kyzyl 115263.989 0.230 POLYGON ((96.87792 53.72792, 96.87792 53.72708... 1 9022 р. Енисей – пос. Никитино r.Enisej - pos. Nikitino 184499.118 1.373 POLYGON ((96.87792 53.72708, 96.88042 53.72708... 2 9053 р. Енисей – пос. Базаиха r.Enisej - pos.Bazaiha 302690.417 0.897 POLYGON ((92.38292 56.11042, 92.38292 56.10958...

Column ais_diff is corresponded to % error in area definition

nc_all_q

netcdf files for hydrological observation stations which has no missing values on discharge for 2008-2020 period

nc_all_h

netcdf files for hydrological observation stations which has no missing values on level for 2008-2020 period

nc_all_q_h

netcdf files for hydrological observation stations which has no missing values on discharge and level for 2008-2020 period

nc_concat

data for all available geometry provided in dataset

More details on processing scripts which were used for development of this database can be found in folder of GitHub repository where I store results for my PhD dissertation

05.04.2023 – Significant data changes. Removed catchments and related files that have more than ±15% absolute error in calculated area relative to AIS GMVO information. Now these are data for 1886 catchments across the Russia.

17.05.2023 – Significant data changes. Major review of parsing algorithm for AIS GMVO data. Fixed the way of how 0.0xx values were read. Use previous versions with caution.

11.10.2023 – Significant data changes. Added 278 catchments for CIS region from GRDC resource. Calculate meteorological and environmental attributes for each catchment. New folder /nc_all_q_h with no missing observations on discharge and level. Now these are data for 2164 catchments across CIS.
d
Data for: Intra- and interspecific variation in trace element concentrations...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Ordax Sommer; Arto Luttinen; Aleksi Lehikoinen (2023). Data for: Intra- and interspecific variation in trace element concentrations in feathers of North European Trans-African migrants [Dataset]. http://doi.org/10.5061/dryad.hx3ffbgjw
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.hx3ffbgjw
Dataset updated
Feb 28, 2023
Dataset provided by
Dryad
Authors
Nicolas Ordax Sommer; Arto Luttinen; Aleksi Lehikoinen
Time period covered
Feb 26, 2023
Area covered
Northern Europe
Description
The available dataset and R Script support the paper "Intra- and interspecific variation in trace element concentrations in feathers of North European Trans-African migrants", which will be published by the Journal of Avian Biology (2023). The uploaded files are the following:

Trace_element_feathers.csv: this is the dataset that contains the information used in the paper's analyses. Each row referst to a measurement spot along a feather's rachis. The columns contain the following information:

A running index An indication of which spot is being measured The concentration of each of the 20 elements analysed for the paper at the given spot An identifier for the feather to which the measurement spot belongs An identifier for the individual to which the feather belongs Season during which the feather was collected (spring / autumn) Species to which the feather belonged (barn swallow / willow warbler) Number of the measurement spot along the rachis A code for the individual (same informati...
m
Cross Regional Eucalyptus Growth and Environmental Data
data.mendeley.com
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Erasmus (2024). Cross Regional Eucalyptus Growth and Environmental Data [Dataset]. http://doi.org/10.17632/2m9rcy3dr9.3
Explore at:
Unique identifier
https://doi.org/10.17632/2m9rcy3dr9.3
Dataset updated
Oct 7, 2024
Authors
Christopher Erasmus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:

Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.

ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.

GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).

Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.

Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.

Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.

All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.

Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).
SILO Patched Point Datasets for Queensland
researchdata.edu.au
data.qld.gov.au
+1more
Updated Feb 26, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.qld.gov.au (2014). SILO Patched Point Datasets for Queensland [Dataset]. https://researchdata.edu.au/silo-patched-point-datasets-queensland/660352
Explore at:
Dataset updated
Feb 26, 2014
Dataset provided by
Queensland Governmenthttp://qld.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Queensland
Description
SILO (Scientific Information for Land Owners) is a daily time series of meteorological data at point locations, consisting of station records which have been supplemented by interpolated estimates where observed data are missing. \r Patched Point Datasets for Queensland are available free of charge. To qualify for free access, the user must first register with SILO. For further information about SILO and registration, see the SILO webpage.
S
Geographical distribution and climate data of Cycas taiwaniana
scidb.cn
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CHUNPING XIE (2025). Geographical distribution and climate data of Cycas taiwaniana [Dataset]. http://doi.org/10.57760/sciencedb.19432
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.19432
Dataset updated
Jan 3, 2025
Dataset provided by
Science Data Bank
Authors
CHUNPING XIE
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Description: Geographical Distribution and Climate Data of Cycas taiwaniana (Taiwanese Cycad)This dataset contains the geographical distribution and climate data for Cycas taiwaniana, focusing on its presence across regions in Fujian, Guangdong, and Hainan provinces of China. The dataset includes geographical coordinates (longitude and latitude), monthly climate data (minimum and maximum temperature, and precipitation) across different months, as well as bioclimatic variables based on the WorldClim dataset.**Temporal and Spatial Information** The data covers long-term climate information, with monthly data for each location recorded over a 12-month period (January to December). The dataset includes spatial data in terms of longitude and latitude, corresponding to various locations where Cycas taiwaniana populations are present. The spatial resolution is specific to each point location, and the temporal resolution reflects the monthly climate data for each year.**Data Structure and Units** The dataset consists of 36 records, each representing a unique location with corresponding climate and geographical data. The table includes the following columns: 1. No.: Unique identifier for each data record 2. Longitude: Geographic longitude in decimal degrees 3. Latitude: Geographic latitude in decimal degrees 4. tmin1 to tmin12: Minimum temperature (°C) for each month (January to December) 5. tmax1 to tmax12: Maximum temperature (°C) for each month (January to December) 6. prec1 to prec12: Precipitation (mm) for each month (January to December) 7. bio1 to bio19: Bioclimatic variables (e.g., annual mean temperature, temperature seasonality, precipitation, etc.) derived from WorldClim data (unit varies depending on the variable)The units for each measurement are as follows: - Temperature: Degrees Celsius (°C) - Precipitation: Millimeters (mm) - Bioclimatic variables: Varies depending on the specific variable (e.g., °C, mm)**Data Gaps and Missing Values** The dataset contains some missing values, particularly in the "precipitation" columns for certain months and locations. These missing values may result from gaps in climate station data or limitations in data collection for specific regions. Missing values are indicated as "NA" (Not Available) in the dataset. In cases where data gaps exist, estimations were not made, and the absence of the data is acknowledged in the record.**File Format and Software Compatibility** The dataset is provided in CSV format for ease of use and compatibility with various data analysis tools. It can be opened and processed using software such as Microsoft Excel, R, or Python (with Pandas). Users can download the dataset and work with it in software such as R (https://cran.r-project.org/) or Python (https://www.python.org/). The dataset is compatible with any software that supports CSV files.This dataset provides valuable information for research related to the geographical distribution and climate preferences of Cycas taiwaniana and can be used to inform conservation strategies, ecological studies, and climate change modeling.
Study Hours vs Grades Dataset
kaggle.com
zip
Updated Oct 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Silva (2025). Study Hours vs Grades Dataset [Dataset]. https://www.kaggle.com/datasets/andreylss/study-hours-vs-grades-dataset
Explore at:
zip(33964 bytes)Available download formats
Dataset updated
Oct 12, 2025
Authors
Andrey Silva
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This synthetic dataset contains 5,000 student records exploring the relationship between study hours and academic performance.

Dataset Features

student_id: Unique identifier for each student (1-5000)

study_hours: Hours spent studying (0-12 hours, continuous)

grade: Final exam score (0-100 points, continuous)

Potential Use Cases

Linear regression modeling and practice

Data visualization exercises

Statistical analysis tutorials

Machine learning for beginners

Educational research simulations

Data Quality

No missing values

Normally distributed residuals

Realistic educational scenario

Ready for immediate analysis

Data Generation Code

This dataset was generated using R.

R Code

# Set seed for reproducibility set.seed(42) # Define number of observations (students) n <- 5000 # Generate study hours (independent variable) # Uniform distribution between 0 and 12 hours study_hours <- runif(n, min = 0, max = 12) # Create relationship between study hours and grade # Base grade: 40 points # Each study hour adds an average of 5 points # Add normal noise (standard deviation = 10) theoretical_grade <- 40 + 5 * study_hours # Add normal noise to make it realistic noise <- rnorm(n, mean = 0, sd = 10) # Calculate final grade grade <- theoretical_grade + noise # Limit grades between 0 and 100 grade <- pmin(pmax(grade, 0), 100) # Create the dataframe dataset <- data.frame( student_id = 1:n, study_hours = round(study_hours, 2), grade = round(grade, 2) )
mtcars-parquet
kaggle.com
zip
Updated Aug 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MUHAMMAD ABDAL (2025). mtcars-parquet [Dataset]. https://www.kaggle.com/datasets/muhammadabdal123/mtcars-parquet
Explore at:
zip(1040 bytes)Available download formats
Dataset updated
Aug 17, 2025
Authors
MUHAMMAD ABDAL
Description
Dataset Title: Motor Trend Car Road Tests (mtcars) Description: The data was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). It is a classic, foundational dataset used extensively in statistics and data science for learning exploratory data analysis, regression modeling, and hypothesis testing.

This dataset is a staple in the R programming language (?mtcars) and is now provided here in a clean CSV format for easy access in Python, Excel, and other data analysis environments.

Acknowledgements: This dataset was originally compiled and made available by the journal Motor Trend in 1974. It has been bundled with the R statistical programming language for decades, serving as an invaluable resource for learners and practitioners alike.

Data Dictionary: Each row represents a different car model. The columns (variables) are as follows:

Column Name Data Type Description model object (String) The name and model of the car. mpg float Miles/(US) gallon. A measure of fuel efficiency. cyl integer Number of cylinders (4, 6, 8). disp float Displacement (cubic inches). Engine size. hp integer Gross horsepower. Engine power. drat float Rear axle ratio. Affects torque and fuel economy. wt float Weight (1000 lbs). Vehicle mass. qsec float 1/4 mile time (seconds). A measure of acceleration. vs binary Engine shape (0 = V-shaped, 1 = Straight). am binary Transmission (0 = Automatic, 1 = Manual). gear integer Number of forward gears (3, 4, 5). carb integer Number of carburetors (1, 2, 3, 4, 6, 8). Key Questions & Potential Use Cases: This dataset is perfect for exploring relationships between a car's specifications and its performance. Some classic analysis questions include:

Fuel Efficiency: What factors are most predictive of a car's miles per gallon (mpg)? Is it engine size (disp), weight (wt), or horsepower (hp)?

Performance: How does transmission type (am) affect acceleration (qsec) and fuel economy (mpg)? Do manual cars perform better?

Classification: Can we accurately predict the number of cylinders (cyl) or the type of engine (vs) based on other car features?

Clustering: Are there natural groupings of cars (e.g., performance cars, economy cars) based on their specifications?

Inspiration: This is one of the most famous datasets in statistics. You can find thousands of examples, tutorials, and analyses using it online. It's an excellent starting point for:

Practicing multiple linear regression and correlation analysis.

Building your first EDA (Exploratory Data Analysis) notebook.

Learning about feature engineering and model interpretation.

Comparing statistical results from R and Python (e.g., statsmodels vs scikit-learn).

File Details: mtcars-parquet.csv: The main dataset file in CSV format.

Number of instances (rows): 32

Number of attributes (columns): 12

Missing Values? No, this is a complete dataset.
n
Data from: A systematic evaluation of normalization methods and probe...
data.niaid.nih.gov
dataone.org
+2more
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.cnp5hqc7v
Dataset updated
May 30, 2023
Dataset provided by
Universidade de São Paulo
Hospital for Sick Children
University of Toronto
Authors
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Background The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias.
Methods This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data.
Results The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best-performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor-performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). Methods

Study Participants and Samples

The whole blood samples were obtained from the Health, Well-being and Aging (Saúde, Ben-estar e Envelhecimento, SABE) study cohort. SABE is a cohort of census-withdrawn elderly from the city of São Paulo, Brazil, followed up every five years since the year 2000, with DNA first collected in 2010. Samples from 24 elderly adults were collected at two time points for a total of 48 samples. The first time point is the 2010 collection wave, performed from 2010 to 2012, and the second time point was set in 2020 in a COVID-19 monitoring project (9±0.71 years apart). The 24 individuals were 67.41±5.52 years of age (mean ± standard deviation) at time point one; and 76.41±6.17 at time point two and comprised 13 men and 11 women.

All individuals enrolled in the SABE cohort provided written consent, and the ethic protocols were approved by local and national institutional review boards COEP/FSP/USP OF.COEP/23/10, CONEP 2044/2014, CEP HIAE 1263-10, University of Toronto RIS 39685.

Blood Collection and Processing

Genomic DNA was extracted from whole peripheral blood samples collected in EDTA tubes. DNA extraction and purification followed manufacturer’s recommended protocols, using Qiagen AutoPure LS kit with Gentra automated extraction (first time point) or manual extraction (second time point), due to discontinuation of the equipment but using the same commercial reagents. DNA was quantified using Nanodrop spectrometer and diluted to 50ng/uL. To assess the reproducibility of the EPIC array, we also obtained technical replicates for 16 out of the 48 samples, for a total of 64 samples submitted for further analyses. Whole Genome Sequencing data is also available for the samples described above.

Characterization of DNA Methylation using the EPIC array

Approximately 1,000ng of human genomic DNA was used for bisulphite conversion. Methylation status was evaluated using the MethylationEPIC array at The Centre for Applied Genomics (TCAG, Hospital for Sick Children, Toronto, Ontario, Canada), following protocols recommended by Illumina (San Diego, California, USA).

Processing and Analysis of DNA Methylation Data

The R/Bioconductor packages Meffil (version 1.1.0), RnBeads (version 2.6.0), minfi (version 1.34.0) and wateRmelon (version 1.32.0) were used to import, process and perform quality control (QC) analyses on the methylation data. Starting with the 64 samples, we first used Meffil to infer the sex of the 64 samples and compared the inferred sex to reported sex. Utilizing the 59 SNP probes that are available as part of the EPIC array, we calculated concordance between the methylation intensities of the samples and the corresponding genotype calls extracted from their WGS data. We then performed comprehensive sample-level and probe-level QC using the RnBeads QC pipeline. Specifically, we (1) removed probes if their target sequences overlap with a SNP at any base, (2) removed known cross-reactive probes (3) used the iterative Greedycut algorithm to filter out samples and probes, using a detection p-value threshold of 0.01 and (4) removed probes if more than 5% of the samples having a missing value. Since RnBeads does not have a function to perform probe filtering based on bead number, we used the wateRmelon package to extract bead numbers from the IDAT files and calculated the proportion of samples with bead number < 3. Probes with more than 5% of samples having low bead number (< 3) were removed. For the comparison of normalization methods, we also computed detection p-values using out-of-band probes empirical distribution with the pOOBAH() function in the SeSAMe (version 1.14.2) R package, with a p-value threshold of 0.05, and the combine.neg parameter set to TRUE. In the scenario where pOOBAH filtering was carried out, it was done in parallel with the previously mentioned QC steps, and the resulting probes flagged in both analyses were combined and removed from the data.

Normalization Methods Evaluated

The normalization methods compared in this study were implemented using different R/Bioconductor packages and are summarized in Figure 1. All data was read into R workspace as RG Channel Sets using minfi’s read.metharray.exp() function. One sample that was flagged during QC was removed, and further normalization steps were carried out in the remaining set of 63 samples. Prior to all normalizations with minfi, probes that did not pass QC were removed. Noob, SWAN, Quantile, Funnorm and Illumina normalizations were implemented using minfi. BMIQ normalization was implemented with ChAMP (version 2.26.0), using as input Raw data produced by minfi’s preprocessRaw() function. In the combination of Noob with BMIQ (Noob+BMIQ), BMIQ normalization was carried out using as input minfi’s Noob normalized data. Noob normalization was also implemented with SeSAMe, using a nonlinear dye bias correction. For SeSAMe normalization, two scenarios were tested. For both, the inputs were unmasked SigDF Sets converted from minfi’s RG Channel Sets. In the first, which we call “SeSAMe 1”, SeSAMe’s pOOBAH masking was not executed, and the only probes filtered out of the dataset prior to normalization were the ones that did not pass QC in the previous analyses. In the second scenario, which we call “SeSAMe 2”, pOOBAH masking was carried out in the unfiltered dataset, and masked probes were removed. This removal was followed by further removal of probes that did not pass previous QC, and that had not been removed by pOOBAH. Therefore, SeSAMe 2 has two rounds of probe removal. Noob normalization with nonlinear dye bias correction was then carried out in the filtered dataset. Methods were then compared by subsetting the 16 replicated samples and evaluating the effects that the different normalization methods had in the absolute difference of beta values (|β|) between replicated samples.
n
Data from: Effects of growth rate, size, and light availability on tree...
narcis.nl
search.dataone.org
+1more
Updated Mar 3, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moustakas, Aristides; Evans, Matthew R. (2015). Data from: Effects of growth rate, size, and light availability on tree survival across life stages: a demographic analysis accounting for missing values and small sample sizes [Dataset]. http://doi.org/10.5061/dryad.6f4qs
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.6f4qs
Dataset updated
Mar 3, 2015
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Moustakas, Aristides; Evans, Matthew R.
Description
Background: Plant survival is a key factor in forest dynamics and survival probabilities often vary across life stages. Studies specifically aimed at assessing tree survival are unusual and so data initially designed for other purposes often need to be used; such data are more likely to contain errors than data collected for this specific purpose. Results: We investigate the survival rates of ten tree species in a dataset designed to monitor growth rates. As some individuals were not included in the census at some time points we use capture-mark-recapture methods both to allow us to account for missing individuals, and to estimate relocation probabilities. Growth rates, size, and light availability were included as covariates in the model predicting survival rates. The study demonstrates that tree mortality is best described as constant between years and size-dependent at early life stages and size independent at later life stages for most species of UK hardwood. We have demonstrated that even with a twenty-year dataset it is possible to discern variability both between individuals and between species. Conclusions: Our work illustrates the potential utility of the method applied here for calculating plant population dynamics parameters in time replicated datasets with small sample sizes and missing individuals without any loss of sample size, and including explanatory covariates.
Supplement 2. R code used for wolf analysis.
wiley.figshare.com
html
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Matthiopoulos; Mark Hebblewhite; Geert Aarts; John Fieberg (2023). Supplement 2. R code used for wolf analysis. [Dataset]. http://doi.org/10.6084/m9.figshare.3550839.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3550839.v1
Dataset updated
May 30, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
Jason Matthiopoulos; Mark Hebblewhite; Geert Aarts; John Fieberg
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
File List Wolf code.r – Source code to run wolf analysis Description This is provided for illustration only, the wolf data are not offered online. The code operates on a data frame in which rows correspond to points in space. The data frame contains a column for use (1 for a telemetry observation, 0 for a control point selected from the wolf’s home range). It also contains columns for x and y coordinates of the point, environmental covariates at that location, wolf ID and wolf pack membership. 1. Data frame preparation The data set is first thinned, for computational expediency, the covariates are standardized to improve convergence and the data frame is augmented with columns for wolf-pack-level covariate expectations (required by the GFR approach). 2. Leave-one-out validation The code allows the removal of a single wolf from the data set. Two models (one with just random effects, the second with GFR interactions) are fit to the data and predictions are made for the missing wolf. The function gof() generates goodness-of-fit diagnostics.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Desai, Manisha (2025). Handling of Missing Data Induced by Time-Varying Covariates in Comparative Effectiveness Research HIV Patients [Methods Study], 2013-2018 [Dataset]. http://doi.org/10.3886/ICPSR39528.v1

Handling of Missing Data Induced by Time-Varying Covariates in Comparative Effectiveness Research HIV Patients [Methods Study], 2013-2018

Explore at:

Unique identifier

https://doi.org/10.3886/ICPSR39528.v1

Dataset updated

Oct 9, 2025

Dataset provided by

Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/

Authors

Desai, Manisha

License

https://www.icpsr.umich.edu/web/ICPSR/studies/39528/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39528/terms

Time period covered

2013 - 2018

Description

Researchers can use data from health registries or electronic health records to compare two or more treatments. Registries store data about patients with a specific health problem. These data include how well those patients respond to treatments and information about patient traits, such as age, weight, or blood pressure. But sometimes data about patient traits are missing. Missing data about patient traits can lead to incorrect study results, especially when traits change over time. For example, weight can change over time, and the patient may not report their weight at some points along the way. Researchers use statistical methods to fill in these missing data. In this study, the research team compared a new statistical method to fill in missing data with traditional methods. Traditional methods remove patients with missing data or fill in each missing number with a single estimate. The new method creates multiple possible estimates to fill in each missing number. To access the methods, software, and R package, please visit the SimulateCER GitHub and SimTimeVar CRAN website.

Clear search

Close search

Google apps

Main menu

Handling of Missing Data Induced by Time-Varying Covariates in Comparative...

Example of how to manually extract incubation bouts from interactive plots...

Dataset from: High consistency and repeatability in the breeding migrations...

R script to calculate daily PAR from solar radiation data

ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...

Dataset Paper (Open Access)

Abstract

Summary

Programmatic Download

Data details

Data Variables

Version Changelog

Software to open netCDF files

References

Related Records

Missing the (tipping) point: the effect of information about climate tipping...

Hydro-meteorological database for watersheds across the CIS

Data for: Intra- and interspecific variation in trace element concentrations...

Cross Regional Eucalyptus Growth and Environmental Data

SILO Patched Point Datasets for Queensland

Geographical distribution and climate data of Cycas taiwaniana

Study Hours vs Grades Dataset

Dataset Features

Potential Use Cases

Data Quality

Data Generation Code

R Code

mtcars-parquet

Data from: A systematic evaluation of normalization methods and probe...

Data from: Effects of growth rate, size, and light availability on tree...

Supplement 2. R code used for wolf analysis.

Handling of Missing Data Induced by Time-Varying Covariates in Comparative Effectiveness Research HIV Patients [Methods Study], 2013-2018