8 datasets found
  1. R code

    • figshare.com
    txt
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Christine Dodge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

  2. MeSH 2023 Update - Delete Report - 4at4-q6rg - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). MeSH 2023 Update - Delete Report - 4at4-q6rg - Archive Repository [Dataset]. https://healthdata.gov/dataset/MeSH-2023-Update-Delete-Report-4at4-q6rg-Archive-R/bjnp-cusd
    Explore at:
    csv, application/rdfxml, json, tsv, application/rssxml, xmlAvailable download formats
    Dataset updated
    Jul 16, 2025
    Description

    This dataset tracks the updates made on the dataset "MeSH 2023 Update - Delete Report" as a repository for previous versions of the data and metadata.

  3. d

    Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://catalog.data.gov/dataset/water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
    Explore at:
    Dataset updated
    Feb 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Contiguous United States
    Description

    This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.

  4. a

    RISE-R

    • western-libraries-geospatial-hub-westernu.hub.arcgis.com
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Western University (2020). RISE-R [Dataset]. https://western-libraries-geospatial-hub-westernu.hub.arcgis.com/datasets/rise-r
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset authored and provided by
    Western University
    Description

    DO NOT DELETE OR MODIFY THIS ITEM. This item is managed by the ArcGIS Hub application. To make changes to this site, please visit https://hub.arcgis.com/admin/

  5. Genomic variant data and codes used for analysis in the manuscript - Whole...

    • figshare.com
    bin
    Updated Jul 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Heraghty (2022). Genomic variant data and codes used for analysis in the manuscript - Whole genome sequencing reveals the structure of environment associated divergence in a broadly distributed montane bumble bee, Bombus vancouverensis [Dataset]. http://doi.org/10.6084/m9.figshare.20310522.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 14, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sam Heraghty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    See below for details of the files included below.

    delly_vanc.vcf.gz # Raw output of Delly

    b.vanc.fully.filtered.100k.plus.recode.vcf.gz # output of freebayes which was filtered using VCFtools v0.1.13 (Danecek et al. 2011) with the following flags: --remove-indels --min-alleles 2 --max-alleles 2 --minQ 20 --minDP 4 --max-missing 0.75

    Above file was also filtered to remove sites with unusually high coverage (>2x mean coverage) or excess heterozygosity. Finally SNPs that fell on scaffolds less than 100kb in length were removed

    b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz #Fully filtered variant file (see manuscript for details) with annotation information

    b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz #Fully filtered variant file (see manuscript for details) after imputation with beagle

    Description of each script contained in this directory

    Trim_N_QC.sh #Trim raw sequencing data and run fastQC to evaluate trimmed data

    BWA_PICARD_vanc1.sh #Example of script used to align sequence data to the reference genome using BWA. Also, uses Picard tools to sort, deduplicate and index bam files

    P_call_test-2-vanc.sh #First part of pipeline for calling SNPS with freebayes (calls freebayes-parallel-part1_vanc.sh)

    freebayes-parallel-part1_vanc.sh #see above

    Filter_vanc.sh #Create list of SV's to filter from DELLY output

    filter_delly.sh #filter based on generated list of SV's

    delly_vanc.sh #call SV's using DELLY

    bcf2vcf.sh # convert bcf from DELLY to vcf format

    freebayes-parallel-part2.sh #Second part of freebayes pipeline

    merge_vanc_vars.sh #Second part of freebayes pipeline (calls freebayes-parallel-part2.sh)

    site_depth_vanc.sh #Gets site depth per SNP

    remove_highdepth_vanc.sh #removes SNPs above depth threshold

    hardy_vanc.sh #calculates HWE per SNP

    remove_hwe_vanc.sh #removes SNPs based on HWE threshold

    filter_vcf_size.sh #Removes SNPs on scaffolds less than 100Kb in size

    filter_vcf_maf05.sh #filters SNPs based on 5% MAF filter

    beagle.sh #imputes using beagle

    LEA_con.R #converts vcf file into LFMM and geno format

    Snpeff_ANN.sh # annotate vcf file using SNPeff

    plink_for_sambaR.sh # convert vcf file into format ready for use in sambaR

    LD_test.sh #example of script used to calculate LD per scaffold

    vcf_stats.sh #Gets various stats from final filtered vcf

    get_pi_diversity.sh #gets per population nucleotide diversity

    sambaR.R #Runs SambaR

    lfmm2_analysis.R #Code for running analysis on output of LFMM2 and generating graphs

    Max_ent_map.R #Generates maxent map

    RDA_script.R #Code for RDA analysis of structural variants

    snprelate_script.R #runs SNPrelate as well as makes graphs of Fst and pi along scaffolds of interest

    repeat_correctedfst.R #Analysis for correlation between repeat density and Fst

    LD_script.R #analysis of linkage

  6. f

    Data from: Error and anomaly detection for intra-participant time-series...

    • tandf.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    David R. Mullineaux; Gareth Irwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

  7. Dataset - High-resolution mapping of wood burning appliance hotspots using...

    • zenodo.org
    tar
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calum Kennedy; Calum Kennedy; Laura Horsfall; Laura Horsfall (2025). Dataset - High-resolution mapping of wood burning appliance hotspots using Energy Performance Certificates: A case study of England and Wales [Dataset]. http://doi.org/10.5281/zenodo.14640852
    Explore at:
    tarAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Calum Kennedy; Calum Kennedy; Laura Horsfall; Laura Horsfall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains open data and code to replicate the analysis in the manuscript "High-resolution mapping of wood burning appliance hotspots using Energy Performance Certificates: A case study of England and Wales".

    To recreate the analysis on your local device, please carry out the following steps:

    1. Clone the GitHub repository (available at: https://github.com/UCL-Wellcome-Trust-Air-Pollution/EPC_mapping_project_code) to your local device, or download the codebase from the 'Code.tar' folder and unzip in your project directory. Please ensure you use the directory with the R Project in it as your root directory.

    2. Download the 'Data.tar' file and unzip the file in the R Project directory. The data should be in a folder called 'Data' in the root directory. All non-EPC data is provided under the UK Open Government License version 3.0. EPC data is provided under licence from DLUHC: https://epc.opendatacommunities.org/docs/copyright.

    3. Download the main EPC data to your local device and unzip (see below for detailed instructions on how to do this). For Windows users, the 'Scripts' folder of the repository contains a .bat file which can be used to unzip the data. Note that this file requires the user to have installed 7Zip and added 7Zip to the system path. Otherwise, the .tar file can be unzipped manually.

    4. Run the 'run.R' file in the 'Scripts' folder of the directory. You may need to change the 'path_data_epc_folders' variable to the path to the unzipped EPC data folders on your local device (see step 3). The full pipeline should now run.

    5. Once you have run the pipeline for the first time, you should see a file called 'data_epc_raw.parquet' in the 'Data/raw/epc_data' folder. Once you have verified this is the case, you can safely delete the original unzipped EPC data folder, since the file is very large (>40Gb). If you run the pipeline again, you will be prompted that the raw EPC data .parquet file already exists, and you have the option to skip the merging of raw data files.

  8. Percentage (%) and number (n) of missing values in the explanatory variables...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner (2024). Percentage (%) and number (n) of missing values in the explanatory variables and outcome by measurement occasion and sex. [Dataset]. http://doi.org/10.1371/journal.pone.0295726.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PA: physical activity. Here we show only the first interview data for variables used as time-fixed in the model (height, education and smoking—following the change suggested by IDA) and remove the observations missing by design.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
Organization logoOrganization logo

R code

Explore at:
txtAvailable download formats
Dataset updated
Jun 5, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Christine Dodge
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

Search
Clear search
Close search
Google apps
Main menu