Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers
This dataset tracks the updates made on the dataset "MeSH 2023 Update - Delete Report" as a repository for previous versions of the data and metadata.
This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the _byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr_{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fig 2
Bone marrow (Fig 2B, D, E, F, H, Supplementary Fig 1A, 2,3)
1. Fig 2/BM/Reference/ Fig2_BM_prepare_data.R: Prepare bone marrow for CellFuse
2. Fig 2/BM/ BM_CellFuse_Integration.R: Run CellFuse
3. Fig 2/BM/BM_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)
4. Fig 2/BM/BM_scIB_Benchmarking.ipynb: evaluate performance of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al.
5. Fig 2/BM/ BM_scIB_prepare_figures.R: Visualize results of scIB framework
6. Fig 2/BM/Sequential_Feature_drop/Prepare_data.R: Prepare data for evaluating sequential feature drop
7. Fig 2/BM/Sequential_Feature_drop/Run_methods.R: Run CellFuse, Harmony, Seurat and FastMNN for sequential feature drop
8. Fig 2/BM/Sequential_Feature_drop/Evaluate_results.R: Evaluate results features drop and visualize data.
PBMC (Fig 2G,I, Supplementary Fig 1B and 4)
1. Fig 2/PBMC/Reference/ Fig2_PBMC_prepare_data.R: Prepare PBMC data for CellFuse
2. Fig 2/ PBMC / PBMC_CellFuse_Integration.R: Run CellFuse
3. Fig 2/ PBMC /PBMC_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)
4. Fig 2/ PBMC /PBMC_scIB_Benchmarking.ipynb: evaluate performace of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al., 2021
5. Fig 2/ PBMC /PBMC_scIB_prepare_figures.R: Visualize results of scIB framework
6. Fig 2/ PBMC/ RunTime_benchmark/Run_Benchmark.R: Prepare data, run benchmarking method and evaluate results.
Fig 3 and Supplementary Fig 5
1. Fig 3/Reference/ Fig3_CyTOF_prepare_data.R: Prepare CyTOF and CITE-Seq data for CellFuse
2. Fig 3/CellFuse_Integration_CyTOF.R: Run CellFuse to remove batch effect and integrate CyTOF data from day 7 post-infusion
3. Fig 3/CellFuse_Integration_CITESeq.R: Run CellFuse to integrate CyTOF and CITE-Seq data
4. Fig 3/CART_Data_visualisation.R: Visualize data
Fig 4
HuBMAP CODEX data (Fig. 4A, B, C, D and Supplementary Fig 6)
1. Fig 4/CODEX_colorectal/Reference/ CODEX_HuBMAP_prepare_data.R: Prepare CODEX data from annotated and unannotated donor
2. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_CellFuse_Predict.R: Run CellFuse on cells from from annotated and unannotated donor
3. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Data_visualisation.R: Visualize data and prepare figures.
4. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_Benchmark.R: Benchmarking CellFuse against CELESTA, SVM and Seurat using cells from annotated donors and prepare figures.
a. Astir is python package so run following python notebook: Fig 4/ CODEX_colorectal/ Benchmarking/Astir/Astrir.ipynb
5. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Suppl_figure_heatmap.R: F1score calculation per celltype per Benchmarking methods and heatmap comparing celltypes from annotated and unannotated donors (Supplementary Fig 6)
IMC Breast cancer data (Fig. 4E,F, G and Supplementary Fig 7)
1. Fig 4/ IMC_Breast_Cancer/ IMC_prepare_data.R: Prepare CODEX data from annotated and unannotated donor
2. Fig 4/ IMC_Breast_Cancer/ IMC_CellFuse_Predict.R: Run CellFuse to predict cell types
3. Fig 4/ IMC_Breast_Cancer/ IMC_dat_visualization.R: Visualize data and prepare figures.
Fig 5
1. Fig5/ Reference/ Fig5_CyTOF_Data_prep.R: Prepare CyTOF data from healthy PBMC and healthy colon single cells
2. Fig5/ MIBI_CellFuse_Predict.R: Run CellFuse to predicte cells from colon cancer patients
3. Fig5/ MIBI_PostPrediction.R: Visualize data and prepare figures
4. Fig5/ Predicted_Data/ mask_generation.ipynb: Post CellFuse prediction annotated cell types in segmented images. This will generate Fig5C and D
Market basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Featall.Rdata:dataset used to generate the results, including the MTR calculated for both birds and insects.01_MTR_tidy.Rmd: R script to combine bird and insect data & remove precipitation and technical contamination02_M_L_separation.Rmd: R script to calculate proportion o migration for each day/night and estimate total n. of birds and insects per year03_trend_construction.Rmd: R script to construct and plot trend of animal movement04_phenology_figs.Rmd: R script to plot flight direction (Fig. 2) and proportion of migration (Fig. S1)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
These data and computer code (written in R, https://www.r-project.org) were created to statistically evaluate a suite of spatiotemporal covariates that could potentially explain pronghorn (Antilocapra americana) mortality risk in the Northern Sagebrush Steppe (NSS) ecosystem (50.0757o N, −108.7526o W). Known-fate data were collected from 170 adult female pronghorn monitored with GPS collars from 2003-2011, which were used to construct a time-to-event (TTE) dataset with a daily timescale and an annual recurrent origin of 11 November. Seasonal risk periods (winter, spring, summer, autumn) were defined by median migration dates of collared pronghorn. We linked this TTE dataset with spatiotemporal covariates that were extracted and collated from pronghorn seasonal activity areas (estimated using 95% minimum convex polygons) to form a final dataset. Specifically, average fence and road densities (km/km2), average snow water equivalent (SWE; kg/m2), and maximum decadal normalized difference vegetation index (NDVI) were considered as predictors. We tested for these main effects of spatiotemporal risk covariates as well as the hypotheses that pronghorn mortality risk from roads or fences could be intensified during severe winter weather (i.e., interactions: SWE*road density and SWE*fence density). We also compare an analogous frequentist implementation to estimate model-averaged risk coefficients. Ultimately, the study aimed to develop the first broad-scale, spatially explicit map of predicted annual pronghorn survivorship based on anthropogenic features and environmental gradients to identify areas for conservation and habitat restoration efforts.
Methods We combined relocations from GPS-collared adult female pronghorn (n = 170) with raster data that described potentially important spatiotemporal risk covariates. We first collated relocation and time-to-event data to remove individual pronghorn from the analysis that had no spatial data available. We then constructed seasonal risk periods based on the median migration dates determined from a previous analysis; thus, we defined 4 seasonal periods as winter (11 November–21 March), spring (22 March–10 April), summer (11 April–30 October), and autumn (31 October–10 November). We used the package 'amt' in Program R to rarify relocation data to a common 4-hr interval using a 30-min tolerance. We used the package 'adehabitatHR' in Program R to estimate seasonal activity areas using 95% minimum convex polygon. We constructed annual- and seasonal-specific risk covariates by averaging values within individual activity areas. We specifically extracted values for linear features (road and fence densities), a proxy for snow depth (SWE), and a measure of forage productivity (NDVI). We resampled all raster data to a common resolution of 1 km2. Given that fence density models characterized regional-scale variation in fence density (i.e., 1.5 km2), this resolution seemed appropriate for our risk analysis. We fit Bayesian proportional hazards (PH) models using a time-to-event approach to model the effects of spatiotemporal covariates on pronghorn mortality risk. We aimed to develop a model to understand the relative effects of risk covariates for pronghorn in the NSS. The effect of fence or road densities may depend on SWE such that the variables interact in affecting mortality risk. Thus, our full candidate model included four main effects and two interaction terms. We used reversible-jump Markov Chain Monte Carlo (RJMCMC) to determine relative support for a nested set of Bayesian PH models. This allowed us to conduct Bayesian model selection and averaging in one step by using two custom samplers provided for the R package 'nimble'. For brevity, we provide the final time-to-event dataset and analysis code rather than include all of the code, GIS, etc. used to estimate seasonal activity areas and extract and collate spatial risk covariates for each individual. Rather we provide the data and all code to reproduce the risk regression results presented in the manuscript.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The zip file contains the data files and R analysis script used in the manuscript titled 'Attentional bias modification in virtual reality - a VR-based dot-probe task with 2D and 3D stimuli'Analysis_script.R is a script file that can be opened by the statistical software R (https://www.r-project.org/) and Rstudio (https://www.rstudio.com/). All analysis steps and codes are found within this file.All files under the Data_files folder are directly called by Analysis_script from R, therefore please ensure that the folder structure and file names remain the same.Folder dot_probe_raw_data_files and its subfolders contain *.xml files with attentional bias (reaction time) data from the participants, generated by the VR program.outcome_measures_and_demographic_data.xlsx contains participant demographic data and questionnaire measures, generated by the iTerapi platform. This data file has been cleaned to remove information irrelevant to the analysis (e.g. number of reminder emails sent etc.).lsas_pre_individual_items.xlsx contains participant responses to individual items of the LSAS-SR questionnaire, generated by the iTerapi platform.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This resource is for historic purposes only and was provided for the GovHack competition (3-5 July 2015). After the event it was discovered that the latitude and longitude columns had been inadvertently inverted. For any project using this data please use the updated version of the resource (link) located here.\r \r We have elected not to remove this resource at this time so as to ensure that any GovHack entries using this data are not disadvantaged during the judging process. We intend to remove this version of the data after the GovHack judging has been completed.\r ==\r
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data contain bathymetric data from the Namibia continental slope. The data were acquired on R/V Meteor research expeditions M76/1 in 2008, and R/V Maria S. Merian expedition MSM19/1c in 2011. The purpose of the data was the exploration of the Namibian continental slope and espressially the investigation of large seafloor depressions. The bathymetric data were acquired with the 191-beam 12 kHz Kongsberg EM120 system. The data were processed using the public software package MBSystems. The loaded data were cleaned semi-automatically and manually, removing outliers and other erroneous data. Initial velocity fields were adjusted to remove artifacts from the data. Gridding was done in 10x10 m grid cells for the MSM19-1c dataset and 50x50 m for the M76 dataset using the Gaussian Weighted Mean algorithm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
See below for details of the files included below.
delly_vanc.vcf.gz # Raw output of Delly
b.vanc.fully.filtered.100k.plus.recode.vcf.gz # output of freebayes which was filtered using VCFtools v0.1.13 (Danecek et al. 2011) with the following flags: --remove-indels --min-alleles 2 --max-alleles 2 --minQ 20 --minDP 4 --max-missing 0.75
b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz #Fully filtered variant file (see manuscript for details) with annotation information
b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz #Fully filtered variant file (see manuscript for details) after imputation with beagle
Trim_N_QC.sh #Trim raw sequencing data and run fastQC to evaluate trimmed data
BWA_PICARD_vanc1.sh #Example of script used to align sequence data to the reference genome using BWA. Also, uses Picard tools to sort, deduplicate and index bam files
P_call_test-2-vanc.sh #First part of pipeline for calling SNPS with freebayes (calls freebayes-parallel-part1_vanc.sh)
freebayes-parallel-part1_vanc.sh #see above
Filter_vanc.sh #Create list of SV's to filter from DELLY output
filter_delly.sh #filter based on generated list of SV's
delly_vanc.sh #call SV's using DELLY
bcf2vcf.sh # convert bcf from DELLY to vcf format
freebayes-parallel-part2.sh #Second part of freebayes pipeline
merge_vanc_vars.sh #Second part of freebayes pipeline (calls freebayes-parallel-part2.sh)
site_depth_vanc.sh #Gets site depth per SNP
remove_highdepth_vanc.sh #removes SNPs above depth threshold
hardy_vanc.sh #calculates HWE per SNP
remove_hwe_vanc.sh #removes SNPs based on HWE threshold
filter_vcf_size.sh #Removes SNPs on scaffolds less than 100Kb in size
filter_vcf_maf05.sh #filters SNPs based on 5% MAF filter
beagle.sh #imputes using beagle
LEA_con.R #converts vcf file into LFMM and geno format
Snpeff_ANN.sh # annotate vcf file using SNPeff
plink_for_sambaR.sh # convert vcf file into format ready for use in sambaR
LD_test.sh #example of script used to calculate LD per scaffold
vcf_stats.sh #Gets various stats from final filtered vcf
get_pi_diversity.sh #gets per population nucleotide diversity
sambaR.R #Runs SambaR
lfmm2_analysis.R #Code for running analysis on output of LFMM2 and generating graphs
Max_ent_map.R #Generates maxent map
RDA_script.R #Code for RDA analysis of structural variants
snprelate_script.R #runs SNPrelate as well as makes graphs of Fst and pi along scaffolds of interest
repeat_correctedfst.R #Analysis for correlation between repeat density and Fst
LD_script.R #analysis of linkage
A 150-kHz narrowband RD Instruments Acoustic Doppler Current Profiler (ADCP) internally recorded 34,805 current ensembles in 362 days from an Ice-Ocean Buoy (IOEB) deployed during the SHEBA project. The IOEB was initially deployed about 50 km from the main camp and drifted from 75.1 N, 141 W to 80.6 N, 160 W between October 1, 1997 and September 30, 1998. The ADCP was located at a depth of 14m below the ice surface and was configured to record data at 15 minute intervals from 40 8m wide bins extending downward 320m below the instrument. The retrieved 24 Mb raw data are processed to remove noise, correct for platform drift and geomagnetic declination, remove bottom hits, and output 2-hr average Earth-referenced current profiles along with ancillary data.
This dataset contains cleaned GBIF (www.gbif.org) occurrence records and associated climate and environmental data for all arthropod prey of listed species in California drylands as identified in Lortie et al. (2023): https://besjournals.onlinelibrary.wiley.com/doi/full/10.1002/2688-8319.12251. All arthropod records were downloaded from GBIF (https://doi.org/10.15468/dl.ngym3r) on 14 November 2022. Records were imported into R using the rgbif package and cleaned with the coordinateCleaner package to remove occurrence data with likely errors. Environmental data include bioclimatic variables from WorldClim (www.worldclim.org), landcover and NDVI data from MODIS and the LPDAAC (https://lpdaac.usgs.gov/), elevation data from the USGS (https://www.sciencebase.gov/catalog/item/542aebf9e4b057766eed286a), and distance to the nearest road from the census bureau's TIGER/Line road shapefile (https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html). All environmental data were combined into a stacked raster and we extracted the environmental variables for each occurrence record from this raster to make the final dataset.
https://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreementhttps://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreement
The SNP dataset for each species investigated in this study is present. These datasets are saved as R data objects in list formats with meta-data for samples and post filtering DArTseq SNPs. Sample filtering included removing samples which we suspected to be mis-identified taxa, hybrids and those with >50% missing data, after which any samples from populations with less than 5 suitable samples remaining were also removed. SNP filtering included removing loci with reproducibility values below 0.96, missingness of >20%, followed by subsampling to one SNP per locus to remove any linkage effects. Datasets can be read into R, where they are formatted as list objects.
Along track temperature, Salinity, backscatter, Chlorophyll Fluoresence, and normalized water leaving radiance (nLw).
On the bow of the vessel was a Satlantic SeaWiFS Aircraft Simulator (MicroSAS) system, used to estimate water-leaving radiance from the ship, analogous to to the nLw derived by the SeaWiFS and MODIS satellite sensors, but free from atmospheric error (hence, it can provide data below clouds).
The system consisted of a down-looking radiance sensor and a sky-viewing radiance sensor, both mounted on a steerable holder on the bow. A downwelling irradiance sensor was mounted at the top of the ship's meterological mast, on the bow, far from any potentially shading structures. These data were used to estimate normalized water-leaving radiance as a function of wavelength. The radiance detector was set to view the water at 40deg from nadir as recommended by Mueller et al. [2003b]. The water radiance sensor was able to view over an azimuth range of ~180deg across the ship's heading with no viewing of the ship's wake. The direction of the sensor was adjusted to view the water 90-120deg from the sun's azimuth, to minimize sun glint. This was continually adjusted as the time and ship's gyro heading were used to calculate the sun's position using an astronomical solar position subroutine interfaced with a stepping motor which was attached to the radiometer mount (designed and fabricated at Bigelow Laboratory for Ocean Sciences). Protocols for operation and calibration were performed according to Mueller [Mueller et al., 2003a; Mueller et al., 2003b; Mueller et al., 2003c]. Before 1000h and after 1400h, data quality was poorer as the solar zenith angle was too low. Post-cruise, the 10Hz data were filtered to remove as much residual white cap and glint as possible (we accept the lowest 5% of the data). Reflectance plaque measurements were made several times at local apparent noon on sunny days to verify the radiometer calibrations.
Within an hour of local apparent noon each day, a Satlantic OCP sensor was deployed off the stern of the vessel after the ship oriented so that the sun was off the stern. The ship would secure the starboard Z-drive, and use port Z-drive and bow thruster to move the ship ahead at about 25cm s-1. The OCP was then trailed aft and brought to the surface ~100m aft of the ship, then allowed to sink to 100m as downwelling spectral irradiance and upwelling spectral radiance were recorded continuously along with temperature and salinity. This procedure ensured there were no ship shadow effects in the radiometry.
Instruments include a WETLabs wetstar fluorometer, a WETLabs ECOTriplet and a SeaBird microTSG.
Radiometry was done using a Satlantic 7 channel microSAS system with Es, Lt and Li sensors.
Chl data is based on inter calibrating surface discrete Chlorophyll measure with the temporally closest fluorescence measurement and applying the regression results to all fluorescence data.
Data have been corrected for instrument biofouling and drift based on weekly purewater calibrations of the system. Radiometric data has been processed using standard Satlantic processing software and has been checked with periodic plaque measurements using a 2% spectralon standard.
Lw is calculated from Lt and Lsky and is \"what Lt would be if the
sensor were looking straight down\". Since our sensors are mounted at
40o, based on various NASA protocols, we need to do that conversion.
Lwn adds Es to the mix. Es is used to normalize Lw. Nlw is related to Rrs, Remote Sensing Reflectance
Techniques used are as described in:
Balch WM, Drapeau DT, Bowler BC, Booth ES, Windecker LA, Ashe A (2008) Space–time variability of carbon standing stocks and fixation rates in the Gulf of Maine, along the GNATS transect between Portland, ME, USA, and Yarmouth, Nova Scotia, Canada.
J Plankton Res 30:119–139
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the final occurrence record dataset produced for the manuscript "Depth Matters for Marine Biodiversity". Detailed methods for the creation of the dataset, below, have been excerpted from Appendix I: Extended Methods. Detailed citations for the occurrence datasets from which these data were derived can also be foud in Appedix I of the manuscript.
We assembled a list of all recognized species of fishes from the orders Scombiformes (sensu Betancur-R et al., 2017), Gadiformes, and Beloniformes by accessing FishBase (Boettiger et al., 2012; Froese & Pauly, 2017) and the Ocean Biodiversity Information System (OBIS; OBIS, 2022; Provoost & Bosch, 2019) through queries in R (R Core Team, 2021). Species were considered Atlantic if their FishBase distribution or occurrence records on OBIS included any area within the Atlantic or Mediterranean major fishing regions as defined by the Food and Agriculture Organization of the United Nations (FAO Regions 21, 27, 31, 34, 37, 41, 47, and 48; FAO, 2020) The database query script can be found on the project code repository (https://github.com/hannahlowens/3DFishRichness/blob/main/1_OccurrenceSearch.R). We then curated the list of names to resolve discrepancies in taxonomy and known distributions through comparison with the Eschmeyer Catalog of Fishes (Eschmeyer & Fricke, 2015), accessed in September of 2020, as our ultimate taxonomic authority. The resulting list of species was then mapped onto the Global Biodiversity Information Facility’s backbone taxonomy (Chamberlain et al., 2021; GBIF, 2020a) to ensure taxonomic concurrence across databases (Appendix I Table 1). The final taxonomic list was used to download occurrence records from OBIS (OBIS, 2022) and GBIF (GBIF, 2020b) in R through robis and occCite (Chamberlain et al., 2020; Provoost & Bosch, 2019; Owens et al., 2021).
Once the resulting data were mapped and curated to remove records with putatively spurious coordinates, under-sampled regions and species were augmented with data from publicly available digital museum collection databases not served through OBIS or GBIF, as well as a literature search. For each species, duplicate points were removed from two- and three-dimensional species occurrence datasets separately, and inaccurate depth records were removed from 3D datasets. Inaccuracy was determined based on extreme statistical outliers (values greater than 2 or less than -2 when occurrence depths were centered and scaled), depth ranges that exceeded bathymetry at occurrence coordinates, and occurrence far outside known depth ranges compared to information from FishBase, Eschmeyer’s Catalog of Fishes, and congeneric depth ranges in the dataset. Finally, for datasets with more than 20 points remaining after cleaning, occurrence data were downsampled to the resolution of the environmental data; that is, to 1 point per 1 degree grid cell in the 2D dataset, and to one point per depth slice per 1 degree grid cell in the 3D dataset. Counts of raw and cleaned records for each species can be found in Appendix 1 Table 1.
References:
Betancur-R, R., Wiley, E. O., Arratia, G., Acero, A., Bailly, N., Miya, M., Lecointre, G., & Ortí, G. (2017). Phylogenetic classification of bony fishes. BMC Evolutionary Biology, 17(1), 162. https://doi.org/10.1186/s12862-017-0958-3
Boettiger, C., Lang, D. T., & Wainwright, P. C. (2012). rfishbase: exploring, manipulating and visualizing FishBase data from R. Journal of Fish Biology, 81(6), 2030–2039. https://doi.org/10.1111/j.1095-8649.2012.03464.x
Chamberlain, S., Barve, V., McGlinn, D., Oldoni, D., Desmet, P., Geffert, L., & Ram, K. (2021). rgbif: Interface to the Global Biodiversity Information Facility API. https://CRAN.R-project.org/package=rgbif
Eschmeyer, & Fricke, W. N. &. (2015). Taxonomic checklist of fish species listed in the CITES Appendices and EC Regulation 338/97 (Elasmobranchii, Actinopteri, Coelacanthi, and Dipneusti, except the genus Hippocampus). Catalog of Fishes, Electronic Version. Accessed September, 2020. https://www.calacademy.org/scientists/projects/eschmeyers-catalog-of-fishes
FAO. (2020). FAO Major Fishing Areas. United Nations Fisheries and Aquaculture Division. https://www.fao.org/fishery/en/collection/area
Froese, R., & Pauly, D. (2017). FishBase. Accessed September, 2022. www.fishbase.org
GBIF.org. (2020a). GBIF Backbone Taxonomy. Accessed September, 2020. GBIF.org
GBIF.org. (2020b). GBIF Occurrence Download. Accessed November, 2020. https://doi.org/10.15468
OBIS. (2020). Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. Accessed November, 2020. www.obis.org
Owens, H. L., Merow, C., Maitner, B. S., Kass, J. M., Barve, V., & Guralnick, R. P. (2021). occCite: Tools for querying and managing large biodiversity occurrence datasets. Ecography, 44(8), 1228–1235. https://doi.org/10.1111/ecog.05618
Provoost, P., & Bosch, S. (2019). robis: R Client to access data from the OBIS API. https://cran.r-project.org/package=robis
R Core Team. (2021). R: A Language and Environment for Statistical Computing. https://www.R-project.org/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary:
Marine geophysical exchange files for R/V Kilo Moana: 2002 to 2018 includes 328 geophysical archive files spanning km0201, the vessel's very first expedition, through km1812, the last survey included in this data synthesis.
Data formats (you will likely require only one of these):
MGD77T (M77T): ASCII - the current standard format for marine geophysical data exchange, tab delimited, low human readability
MGD77: ASCII - legacy format for marine geophysical data exchange (no longer recommended due to truncated data precision and low human readability)
GMT DAT: ASCII - the Generic Mapping Tools format in which these archive files were built, best human readability but largest file size
MGD77+: highly flexible and disk space saving binary NetCDF-based format, enables adding additional columns and application of errata-based data correction methods (i.e., Chandler et al, 2012), not human readable
The process by which formats were converted is explained below.
Data Reduction and Explanation:
R/V Kilo Moana routinely acquired bathymetry data using two concurrently operated sonar systems hence, for this analysis, a best effort was made to extract center beam depth values from the appropriate sonar system. No resampling or decimation of center beam depth data has been performed with the exception that all depth measurements were required to be temporally separated by at least 1 second. The initial sonar systems were the Kongsberg EM120 for deep and EM1002 for shallow water mapping. The vessel's deep sonar system was upgraded to Kongsberg EM122 in January of 2010 and the shallow system to EM710 in March 2012.
The vessel deployed a Lacoste and Romberg spring-type gravity meter (S-33) from 2002 until March 2012 when it was replaced with a Bell Labs BGM-3 forced feedback-type gravity meter. Of considerable importance is that gravity tie-in logs were by and large inadequate for the rigorous removal of gravity drift and tares. Hence a best effort has been made to remove gravity meter drift via robust regression to satellite-derived gravity data. Regression slope and intercept are analogous to instrument drift and DC shift hence their removal markedly improves the agreement between shipboard and satellite gravity anomalies for most surveys. These drift corrections were applied to both observed gravity and free air anomaly fields. If the corrections are undesired by users, the correction coefficients have been supplied within the metadata headers for all gravity surveys, thereby allowing users to undo these drift corrections.
The L&R gravity meter had a 180 second hardware filter so for this analysis the data were Gaussian filtered another 180 seconds and resampled at 10 seconds. BGM-3 data are not hardware filtered hence a 360 second Gaussian filter was applied for this analysis. BGM-3 gravity anomalies were resampled at 15 second intervals. For both meter types, data gaps exceeding the filter length were not through-interpolated. Eotvos corrections were computed via the standard formula (e.g., Dehlinger, 1978) and were subjected to identical filtering of the respective gravity meter.
The vessel also deployed a Geometrics G-882 cesium vapor magnetometer on several expeditions. A Gaussian filter length of 135 seconds has been applied and resampling was performed at 15 second intervals with the same exception that no interpolation was performed through data gaps exceeding the filter length.
Archive file production:
At all depth, gravity and magnetic measurement times, vessel GPS navigation was resampled using linear interpolation as most geophysical measurement times did not exactly coincide with GPS position times. The geophysical fields were then merged with resampled vessel navigation and listed sequentially in the GMT DAT format to produce data records.
Archive file header fields were populated with relevant information such as port names, PI names, instrument and data processing details, and others whereas survey geographic and temporal boundary fields were automatically computed from the data records.
Archive file conversion:
Once completed, each marine geophysical data exchange file was converted to the other formats using the Generic Mapping Tools program known as mgd77convert. For example, conversions to the other formats were carried out as follows:
mgd77convert km0201.dat -Ft -Tm # gives mgd77t (m77t file extension)
mgd77convert km0201.dat -Ft -Ta # gives mgd77
mgd77convert km0201.dat -Ft -Tc # gives mgd77+ (nc file extension)
Disclaimers:
These data have not been edited in detail using a visual data editor and data outliers are known to exist. Several hardware malfunctions are known to have occurred during the 2002 to 2018 time frame and these malfunctions are apparent in some of the data sets. No guarantee is made that the data are accurate and they are not meant to be used for vessel navigation. Close scrutiny and further removal of outliers and other artifacts is recommended before making scientific determinations from these data.
The archive file production method employed for this analysis is explained in detail by Hamilton et al (2019).
The data contain bathymetric data from the Namibia continental slope. The data were acquired on R/V Meteor research expeditions M76/1 in 2008, and R/V Maria S. Merian expedition MSM19/1c in 2011. The purpose of the data was the exploration of the Namibian continental slope and espressially the investigation of large seafloor depressions. The bathymetric data were acquired with the 191-beam 12 kHz Kongsberg EM120 system. The data were processed using the public software package MBSystems. The loaded data were cleaned semi-automatically and manually, removing outliers and other erroneous data. Initial velocity fields were adjusted to remove artifacts from the data. Gridding was done in 10x10 m grid cells for the MSM19-1c dataset and 50x50 m for the M76 dataset using the Gaussian Weighted Mean algorithm.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the data in
Pereira, M., Faivre, N., Iturrate, I., Wirthlin, M., Serafini, L., Martin, S., Desvachez, A., Blanke, O., Van De Ville, D., Millan, JdR. (2020). Disentangling the origins of confidence in speeded perceptual judgments through multimodal imaging. Proceedings of the National Academy of Science, 117 (15) pp. 8382-8390 https://doi.org/10.1073/pnas.1918335117
Preprint: https://www.biorxiv.org/content/10.1101/496877v1
ABSTRACT The human capacity to compute the likelihood that a decision is correct—known as metacognition—has proven difficult to study in isolation as it usually cooccurs with decision making. Here, we isolated postdecisional from decisional contributions to metacognition by analyzing neural correlates of confidence with multimodal imaging. Healthy volunteers reported their confidence in the accuracy of decisions they made or decisions they observed. We found better metacognitive performance for committed vs. observed decisions, indicating that committing to a decision may improve confidence. Relying on concurrent electroencephalography and hemodynamic recordings, we found a common correlate of confidence following committed and observed decisions in the inferior frontal gyrus and a dissociation in the anterior prefrontal cortex and anterior insula. We discuss these results in light of decisional and postdecisional accounts of confidence and propose a computational model of confidence in which metacognitive performance naturally improves when evidence accumulation is constrained upon committing a decision.
preregistration: https://osf.io/a5qmv/
The dataset contains raw fMRI scans, raw EEG in BrainVision format as well as anatomical scans (T1) and field mapping. We also included preprocessed EEG and fMRI data in derivatives/eegprep and derivatives/fmriprep.
EEG PREPROCESSING MR-gradient artifacts were removed using sliding window average template subtraction. TP10 electrode on the right mastoid was used to detect heartbeats for ballistocardiogram artifact (BCG) removal using a semi-automatic procedure in BrainVision Analyzer 2. Data were then filtered using a Butterworth, 4th order zero-phase (two-pass) bandpass filter between 1 and 10 Hz, epoched [-0.2, 0.6 s] around the response onset (i.e. the button press in the active condition or the appearance of the virtual hand for in observation condition), re-referenced to a common average, and input to independent component analysis (ICA) to remove residual BCG and ocular artifacts. In order to ensure numerical stability when estimating the independent components, we retained 99% of the variance from the electrode space, leading to an average of 19 (SD = 6) components estimated for each participant and condition. Independent components (ICs) were then fitted with a dipolar source localization method (66). ICs whose dipole lied outside the brain, or resembled muscular or ocular artifacts were eliminated. A total of 8 (SD = 3) components were finally kept. All preprocessing steps were performed using EEGLAB and in-house scripts under Matlab (The MathWorks, Inc., Natick, Massachusetts, United States).
FMRI PREPROCESSING We modeled the BOLD signal using a general linear model (GLM) with two separate regressors (stick functions at stimulus onset) for the active and observation condition as well as their spatial and temporal derivatives. We then parametrically modulated the regressors with three behavioral variables: the confidence ratings, the response times, and the numerosity difference between the two arrays of dots (i.e., perceptual evidence). Empirical cross-correlation between regressors confirmed limited collinearity for the active (resp. observation) condition (max(abs(R)) = 0.26 ± 0.02 resp., max(abs(R)) = 0.25 ± 0.02). Bad trials as defined in the behavioral analysis section were modeled by two separate regressors (one for active and one for observation) and their spatial and temporal derivatives. We added six realignments parameters as regressors of no interest. All second-level (group-level) results are reported at a significance-level of p < 0.05 using cluster-extent family-wise error (FWE) correction with a voxel-height threshold of p < 0.001. We used the anatomical automatic labelling (AAL) atlas for brain parcellation (Tzourio-Mazoyer et al., 2002).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers