55 datasets found
  1. R code

    • figshare.com
    txt
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Christine Dodge
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

  2. g

    Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...

    • gimi9.com
    • data.usgs.gov
    • +2more
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://gimi9.com/dataset/data-gov_water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
    Explore at:
    Dataset updated
    Feb 22, 2025
    Area covered
    Contiguous United States
    Description

    This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the _byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr_{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.

  3. Occurrence Record Dataset from "Depth Matters for Marine Biodiversity"

    • zenodo.org
    zip
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Owens; Hannah Owens (2025). Occurrence Record Dataset from "Depth Matters for Marine Biodiversity" [Dataset]. http://doi.org/10.5281/zenodo.16739002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hannah Owens; Hannah Owens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the final occurrence record dataset produced for the manuscript "Depth Matters for Marine Biodiversity". Detailed methods for the creation of the dataset, below, have been excerpted from Appendix I: Extended Methods. Detailed citations for the occurrence datasets from which these data were derived can also be foud in Appedix I of the manuscript.

    We first assembled a list of all recognized species of fishes from the orders Scombiformes (Betancur-R et al., 2017), Gadiformes, and Beloniformes by accessing FishBase (Boettiger et al., 2012; Froese & Pauly, 2017) and the Ocean Biodiversity Information System (OBIS; OBIS, 2022; Provoost & Bosch, 2019) through queries in R (R Core Team, 2021). Species were considered Atlantic if their FishBase distribution or occurrence records on OBIS included any area within the Atlantic or Mediterranean major fishing regions as defined by the Food and Agriculture Organization of the United Nations (FAO Regions 21, 27, 31, 34, 37, 41, 47, and 48; FAO, 2020). The database query script can be found on the project code repository (https://github.com/hannahlowens/3DFishRichness/blob/main/1_OccurrenceSearch.R). We then curated the list of names to resolve discrepancies in taxonomy and known distributions through comparison with the Eschmeyer Catalog of Fishes (Eschmeyer & Fricke, 2015) , accessed in September of 2020, as our ultimate taxonomic authority. The resulting list of species was then mapped onto the Global Biodiversity Information Facility’s backbone taxonomy (Chamberlain et al., 2021; GBIF.org, 2020a) to ensure taxonomic concurrence across databases (Supplementary Table 1). The final taxonomic list was used to download occurrence records from OBIS (OBIS, 2022) and GBIF (GBIF.org, 2020b) in R through robis (Provoost & Bosch, 2019) and occCite (Owens et al., 2021).

    For each species, duplicate points were removed from two- and three-dimensional species occurrence datasets separately, and inaccurate depth records were removed from 3D datasets (all records with and without depth information were retained for the 2D dataset). Depth records were based on the “depth” field in both the GBIF and OBIS datasets, which define the field as “depth below the surface in meters”. We chose this value over incorporating information from “minimumDepthInMeters” and “maximumDepthInMeters” because more records contained information from the “depth” field than either of the two other fields (although when these fields were both supplied, “depth” appears to have been often, but not always, derived by calculated the mean between minimum and maximum depth). We also initially included the “depthAccuracy” field from both datasets but did not ultimately use this field as it was not complete enough to be useful. Instead, we determined depth inaccuracy based on extreme statistical outliers (values greater than 2 or less than -2 when occurrence depths were centered and scaled), depths that exceeded bathymetry at occurrence coordinates, and occurrence depths far outside known depth ranges obtained from FishBase, Eschmeyer’s Catalog of Fishes, and/or congeneric depth ranges in the dataset. Once the resulting data were mapped and curated to remove records with putatively spurious coordinates, under-sampled regions and species were augmented with data from publicly available digital museum collection databases not served through OBIS or GBIF, as well as a literature search. Finally, for datasets with more than 20 points remaining after data curation, occurrence data were downsampled to the resolution of the environmental data; that is, to 1 point per 1 degree grid cell in the 2D dataset, and to one point per depth slice per 1 degree grid cell in the 3D dataset.

    References:

    Betancur-R, R., Wiley, E. O., Arratia, G., Acero, A., Bailly, N., Miya, M., Lecointre, G., & Ortí, G. (2017). Phylogenetic classification of bony fishes. BMC Evolutionary Biology, 17(1), 162. https://doi.org/10.1186/s12862-017-0958-3

    Boettiger, C., Lang, D. T., & Wainwright, P. C. (2012). rfishbase: exploring, manipulating and visualizing FishBase data from R. Journal of Fish Biology, 81(6), 2030–2039. https://doi.org/10.1111/j.1095-8649.2012.03464.x

    Chamberlain, S., Barve, V., McGlinn, D., Oldoni, D., Desmet, P., Geffert, L., & Ram, K. (2021). rgbif: Interface to the Global Biodiversity Information Facility API. https://CRAN.R-project.org/package=rgbif

    Eschmeyer, & Fricke, W. N. &. (2015). Taxonomic checklist of fish species listed in the CITES Appendices and EC Regulation 338/97 (Elasmobranchii, Actinopteri, Coelacanthi, and Dipneusti, except the genus Hippocampus). Catalog of Fishes, Electronic Version. Accessed September, 2020. https://www.calacademy.org/scientists/projects/eschmeyers-catalog-of-fishes

    FAO. (2020). FAO Major Fishing Areas. United Nations Fisheries and Aquaculture Division. https://www.fao.org/fishery/en/collection/area

    Froese, R., & Pauly, D. (2017). FishBase. Accessed September, 2022. www.fishbase.org

    GBIF.org. (2020a). GBIF Backbone Taxonomy. Accessed September, 2020. GBIF.org

    GBIF.org. (2020b). GBIF Occurrence Download. Accessed November, 2020. https://doi.org/10.15468

    OBIS. (2020). Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. Accessed November, 2020. www.obis.org

    Owens, H. L., Merow, C., Maitner, B. S., Kass, J. M., Barve, V., & Guralnick, R. P. (2021). occCite: Tools for querying and managing large biodiversity occurrence datasets. Ecography, 44(8), 1228–1235. https://doi.org/10.1111/ecog.05618

    Provoost, P., & Bosch, S. (2019). robis: R Client to access data from the OBIS API. https://cran.r-project.org/package=robis

    R Core Team. (2021). R: A Language and Environment for Statistical Computing. https://www.R-project.org/

  4. MeSH 2023 Update - Delete Report - 4at4-q6rg - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). MeSH 2023 Update - Delete Report - 4at4-q6rg - Archive Repository [Dataset]. https://healthdata.gov/dataset/MeSH-2023-Update-Delete-Report-4at4-q6rg-Archive-R/bjnp-cusd
    Explore at:
    csv, application/rdfxml, json, tsv, application/rssxml, xmlAvailable download formats
    Dataset updated
    Jul 16, 2025
    Description

    This dataset tracks the updates made on the dataset "MeSH 2023 Update - Delete Report" as a repository for previous versions of the data and metadata.

  5. Genomic variant data and codes used for analysis in the manuscript - Whole...

    • figshare.com
    bin
    Updated Jul 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Heraghty (2022). Genomic variant data and codes used for analysis in the manuscript - Whole genome sequencing reveals the structure of environment associated divergence in a broadly distributed montane bumble bee, Bombus vancouverensis [Dataset]. http://doi.org/10.6084/m9.figshare.20310522.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 14, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sam Heraghty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    See below for details of the files included below.

    delly_vanc.vcf.gz # Raw output of Delly

    b.vanc.fully.filtered.100k.plus.recode.vcf.gz # output of freebayes which was filtered using VCFtools v0.1.13 (Danecek et al. 2011) with the following flags: --remove-indels --min-alleles 2 --max-alleles 2 --minQ 20 --minDP 4 --max-missing 0.75

    Above file was also filtered to remove sites with unusually high coverage (>2x mean coverage) or excess heterozygosity. Finally SNPs that fell on scaffolds less than 100kb in length were removed

    b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz #Fully filtered variant file (see manuscript for details) with annotation information

    b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz #Fully filtered variant file (see manuscript for details) after imputation with beagle

    Description of each script contained in this directory

    Trim_N_QC.sh #Trim raw sequencing data and run fastQC to evaluate trimmed data

    BWA_PICARD_vanc1.sh #Example of script used to align sequence data to the reference genome using BWA. Also, uses Picard tools to sort, deduplicate and index bam files

    P_call_test-2-vanc.sh #First part of pipeline for calling SNPS with freebayes (calls freebayes-parallel-part1_vanc.sh)

    freebayes-parallel-part1_vanc.sh #see above

    Filter_vanc.sh #Create list of SV's to filter from DELLY output

    filter_delly.sh #filter based on generated list of SV's

    delly_vanc.sh #call SV's using DELLY

    bcf2vcf.sh # convert bcf from DELLY to vcf format

    freebayes-parallel-part2.sh #Second part of freebayes pipeline

    merge_vanc_vars.sh #Second part of freebayes pipeline (calls freebayes-parallel-part2.sh)

    site_depth_vanc.sh #Gets site depth per SNP

    remove_highdepth_vanc.sh #removes SNPs above depth threshold

    hardy_vanc.sh #calculates HWE per SNP

    remove_hwe_vanc.sh #removes SNPs based on HWE threshold

    filter_vcf_size.sh #Removes SNPs on scaffolds less than 100Kb in size

    filter_vcf_maf05.sh #filters SNPs based on 5% MAF filter

    beagle.sh #imputes using beagle

    LEA_con.R #converts vcf file into LFMM and geno format

    Snpeff_ANN.sh # annotate vcf file using SNPeff

    plink_for_sambaR.sh # convert vcf file into format ready for use in sambaR

    LD_test.sh #example of script used to calculate LD per scaffold

    vcf_stats.sh #Gets various stats from final filtered vcf

    get_pi_diversity.sh #gets per population nucleotide diversity

    sambaR.R #Runs SambaR

    lfmm2_analysis.R #Code for running analysis on output of LFMM2 and generating graphs

    Max_ent_map.R #Generates maxent map

    RDA_script.R #Code for RDA analysis of structural variants

    snprelate_script.R #runs SNPrelate as well as makes graphs of Fst and pi along scaffolds of interest

    repeat_correctedfst.R #Analysis for correlation between repeat density and Fst

    LD_script.R #analysis of linkage

  6. n

    Data and R code from: Spatiotemporal risk factors predict landscape-scale...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Aug 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Eacker; Andrew Jakes; Paul Jones (2022). Data and R code from: Spatiotemporal risk factors predict landscape-scale survivorship for a northern ungulate [Dataset]. http://doi.org/10.5061/dryad.pvmcvdnnt
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 31, 2022
    Dataset provided by
    Alberta Wildlife Association
    Smithsonian's National Zoo and Conservation Biology Institute
    Taurus Wildlife Consulting
    Authors
    Daniel Eacker; Andrew Jakes; Paul Jones
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    These data and computer code (written in R, https://www.r-project.org) were created to statistically evaluate a suite of spatiotemporal covariates that could potentially explain pronghorn (Antilocapra americana) mortality risk in the Northern Sagebrush Steppe (NSS) ecosystem (50.0757o N, −108.7526o W). Known-fate data were collected from 170 adult female pronghorn monitored with GPS collars from 2003-2011, which were used to construct a time-to-event (TTE) dataset with a daily timescale and an annual recurrent origin of 11 November. Seasonal risk periods (winter, spring, summer, autumn) were defined by median migration dates of collared pronghorn. We linked this TTE dataset with spatiotemporal covariates that were extracted and collated from pronghorn seasonal activity areas (estimated using 95% minimum convex polygons) to form a final dataset. Specifically, average fence and road densities (km/km2), average snow water equivalent (SWE; kg/m2), and maximum decadal normalized difference vegetation index (NDVI) were considered as predictors. We tested for these main effects of spatiotemporal risk covariates as well as the hypotheses that pronghorn mortality risk from roads or fences could be intensified during severe winter weather (i.e., interactions: SWE*road density and SWE*fence density). We also compare an analogous frequentist implementation to estimate model-averaged risk coefficients. Ultimately, the study aimed to develop the first broad-scale, spatially explicit map of predicted annual pronghorn survivorship based on anthropogenic features and environmental gradients to identify areas for conservation and habitat restoration efforts.

    Methods We combined relocations from GPS-collared adult female pronghorn (n = 170) with raster data that described potentially important spatiotemporal risk covariates. We first collated relocation and time-to-event data to remove individual pronghorn from the analysis that had no spatial data available. We then constructed seasonal risk periods based on the median migration dates determined from a previous analysis; thus, we defined 4 seasonal periods as winter (11 November–21 March), spring (22 March–10 April), summer (11 April–30 October), and autumn (31 October–10 November). We used the package 'amt' in Program R to rarify relocation data to a common 4-hr interval using a 30-min tolerance. We used the package 'adehabitatHR' in Program R to estimate seasonal activity areas using 95% minimum convex polygon. We constructed annual- and seasonal-specific risk covariates by averaging values within individual activity areas. We specifically extracted values for linear features (road and fence densities), a proxy for snow depth (SWE), and a measure of forage productivity (NDVI). We resampled all raster data to a common resolution of 1 km2. Given that fence density models characterized regional-scale variation in fence density (i.e., 1.5 km2), this resolution seemed appropriate for our risk analysis. We fit Bayesian proportional hazards (PH) models using a time-to-event approach to model the effects of spatiotemporal covariates on pronghorn mortality risk. We aimed to develop a model to understand the relative effects of risk covariates for pronghorn in the NSS. The effect of fence or road densities may depend on SWE such that the variables interact in affecting mortality risk. Thus, our full candidate model included four main effects and two interaction terms. We used reversible-jump Markov Chain Monte Carlo (RJMCMC) to determine relative support for a nested set of Bayesian PH models. This allowed us to conduct Bayesian model selection and averaging in one step by using two custom samplers provided for the R package 'nimble'. For brevity, we provide the final time-to-event dataset and analysis code rather than include all of the code, GIS, etc. used to estimate seasonal activity areas and extract and collate spatial risk covariates for each individual. Rather we provide the data and all code to reproduce the risk regression results presented in the manuscript.

  7. Air-conditioner Location Running Hours Data for GovHack 2015

    • researchdata.edu.au
    Updated Jul 9, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Industry, Science and Resources (DISR) (2014). Air-conditioner Location Running Hours Data for GovHack 2015 [Dataset]. https://researchdata.edu.au/air-conditioner-location-govhack-2015/2979421
    Explore at:
    Dataset updated
    Jul 9, 2014
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Department of Industry, Science and Resources (DISR)
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This resource is for historic purposes only and was provided for the GovHack competition (3-5 July 2015). After the event it was discovered that the latitude and longitude columns had been inadvertently inverted. For any project using this data please use the updated version of the resource (link) located here.\r \r We have elected not to remove this resource at this time so as to ensure that any GovHack entries using this data are not disadvantaged during the judging process. We intend to remove this version of the data after the GovHack judging has been completed.\r ==\r

  8. t

    Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam...

    • service.tib.eu
    Updated Nov 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam bathymetry processed data (EM 120 echosounder dataset compilation) of RV METEOR & RV MARIA S. MERIAN during cruise M76/1 & MSM19/1c, Namibian continental slope. https://doi.org/10.1594/PANGAEA.932434 [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-932434
    Explore at:
    Dataset updated
    Nov 29, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Namibia
    Description

    The data contain bathymetric data from the Namibia continental slope. The data were acquired on R/V Meteor research expeditions M76/1 in 2008, and R/V Maria S. Merian expedition MSM19/1c in 2011. The purpose of the data was the exploration of the Namibian continental slope and espressially the investigation of large seafloor depressions. The bathymetric data were acquired with the 191-beam 12 kHz Kongsberg EM120 system. The data were processed using the public software package MBSystems. The loaded data were cleaned semi-automatically and manually, removing outliers and other erroneous data. Initial velocity fields were adjusted to remove artifacts from the data. Gridding was done in 10x10 m grid cells for the MSM19-1c dataset and 50x50 m for the M76 dataset using the Gaussian Weighted Mean algorithm.

  9. Data Mining Project - Boston

    • kaggle.com
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SophieLiu
    Area covered
    Boston
    Description

    Context

    To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

    Use of Data Files

    You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

    This loads the file into R

    df<-read.csv('uber.csv')

    The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

    df_black<-subset(uber_df, uber_df$name == 'Black')

    This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

    write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

    The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

    getwd()

    The output will be the file path to your working directory. You will find the file you just created in that folder.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  10. e

    Compiled occurrence records for prey items of listed species found in...

    • knb.ecoinformatics.org
    • search.dataone.org
    • +1more
    Updated Aug 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel King; Jenna Braun; Michael Westphal; CJ Lortie (2023). Compiled occurrence records for prey items of listed species found in California drylands with associated environmental data [Dataset]. http://doi.org/10.5063/F1VM49RH
    Explore at:
    Dataset updated
    Aug 7, 2023
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    Rachel King; Jenna Braun; Michael Westphal; CJ Lortie
    Time period covered
    Jan 1, 1945 - Jan 1, 2022
    Area covered
    Description

    This dataset contains cleaned GBIF (www.gbif.org) occurrence records and associated climate and environmental data for all arthropod prey of listed species in California drylands as identified in Lortie et al. (2023): https://besjournals.onlinelibrary.wiley.com/doi/full/10.1002/2688-8319.12251. All arthropod records were downloaded from GBIF (https://doi.org/10.15468/dl.ngym3r) on 14 November 2022. Records were imported into R using the rgbif package and cleaned with the coordinateCleaner package to remove occurrence data with likely errors. Environmental data include bioclimatic variables from WorldClim (www.worldclim.org), landcover and NDVI data from MODIS and the LPDAAC (https://lpdaac.usgs.gov/), elevation data from the USGS (https://www.sciencebase.gov/catalog/item/542aebf9e4b057766eed286a), and distance to the nearest road from the census bureau's TIGER/Line road shapefile (https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html). All environmental data were combined into a stacked raster and we extracted the environmental variables for each occurrence record from this raster to make the final dataset.

  11. a

    RISE-R

    • western-libraries-geospatial-hub-westernu.hub.arcgis.com
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Western University (2020). RISE-R [Dataset]. https://western-libraries-geospatial-hub-westernu.hub.arcgis.com/datasets/rise-r
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset authored and provided by
    Western University
    Description

    DO NOT DELETE OR MODIFY THIS ITEM. This item is managed by the ArcGIS Hub application. To make changes to this site, please visit https://hub.arcgis.com/admin/

  12. R

    R K.v2i.coco Remove_hashmarkerli Dataset

    • universe.roboflow.com
    zip
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    afb (2025). R K.v2i.coco Remove_hashmarkerli Dataset [Dataset]. https://universe.roboflow.com/afb-nnhqq/r-k.v2i.coco-remove_hashmarkerli-bjhdy
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 26, 2025
    Dataset authored and provided by
    afb
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Variables measured
    YARD12
    Description

    R K.v2i.coco Remove_hashmarkerli

    ## Overview
    
    R K.v2i.coco Remove_hashmarkerli is a dataset for computer vision tasks - it contains YARD12 annotations for 263 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [BY-NC-SA 4.0 license](https://creativecommons.org/licenses/BY-NC-SA 4.0).
    
  13. r

    Data from: SNP datasets used in the paper “Floristic classifications and...

    • researchdata.edu.au
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mr Richard Dimon; Mr Richard Dimon; Honorary Professor Maurizio Rossetto; Dr Patrick Fahey; Dr Patrick Fahey (2023). SNP datasets used in the paper “Floristic classifications and bioregionalizations are not predictors of intra-specific evolutionary patterns” [Dataset]. http://doi.org/10.48610/5C76DE2
    Explore at:
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    The University of Queensland
    Authors
    Mr Richard Dimon; Mr Richard Dimon; Honorary Professor Maurizio Rossetto; Dr Patrick Fahey; Dr Patrick Fahey
    License

    https://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreementhttps://guides.library.uq.edu.au/deposit-your-data/license-reuse-data-agreement

    Description

    The SNP dataset for each species investigated in this study is present. These datasets are saved as R data objects in list formats with meta-data for samples and post filtering DArTseq SNPs. Sample filtering included removing samples which we suspected to be mis-identified taxa, hybrids and those with >50% missing data, after which any samples from populations with less than 5 suitable samples remaining were also removed. SNP filtering included removing loci with reproducibility values below 0.96, missingness of >20%, followed by subsampling to one SNP per locus to remove any linkage effects. Datasets can be read into R, where they are formatted as list objects.

  14. u

    Buoy, IOEB Doppler Current Profiler (ADCP) Data [Krishfield, R.]

    • data.ucar.edu
    • search.dataone.org
    • +2more
    archive
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Plueddemann; Richard Krishfield; Takatoshi Takizawa (2025). Buoy, IOEB Doppler Current Profiler (ADCP) Data [Krishfield, R.] [Dataset]. http://doi.org/10.5065/D6XK8CZ9
    Explore at:
    archiveAvailable download formats
    Dataset updated
    Aug 1, 2025
    Authors
    Albert Plueddemann; Richard Krishfield; Takatoshi Takizawa
    Time period covered
    Oct 1, 1997 - Sep 28, 1998
    Area covered
    Description

    A 150-kHz narrowband RD Instruments Acoustic Doppler Current Profiler (ADCP) internally recorded 34,805 current ensembles in 362 days from an Ice-Ocean Buoy (IOEB) deployed during the SHEBA project. The IOEB was initially deployed about 50 km from the main camp and drifted from 75.1 N, 141 W to 80.6 N, 160 W between October 1, 1997 and September 30, 1998. The ADCP was located at a depth of 14m below the ice surface and was configured to record data at 15 minute intervals from 40 8m wide bins extending downward 320m below the instrument. The retrieved 24 Mb raw data are processed to remove noise, correct for platform drift and geomagnetic declination, remove bottom hits, and output 2-hr average Earth-referenced current profiles along with ancillary data.

  15. l

    LScD (Leicester Scientific Dictionary)

    • figshare.le.ac.uk
    docx
    Updated Apr 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
    Explore at:
    docxAvailable download formats
    Dataset updated
    Apr 15, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Leicester
    Description

    LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.

  16. d

    Underway Data from R/V Melville, R/V Roger Revelle cruises MV1101, RR1202 in...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William M. Balch (2021). Underway Data from R/V Melville, R/V Roger Revelle cruises MV1101, RR1202 in the Southern Ocean (30-60S); 2011-2012 (Great Calcite Belt project) [Dataset]. https://search.dataone.org/view/http%3A%2F%2Flod.bco-dmo.org%2Fid%2Fdataset%2F560142
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    William M. Balch
    Area covered
    Southern Ocean
    Description

    Along track temperature, Salinity, backscatter, Chlorophyll Fluoresence, and normalized water leaving radiance (nLw).

    On the bow of the vessel was a Satlantic SeaWiFS Aircraft Simulator (MicroSAS) system, used to estimate water-leaving radiance from the ship, analogous to to the nLw derived by the SeaWiFS and MODIS satellite sensors, but free from atmospheric error (hence, it can provide data below clouds).

    The system consisted of a down-looking radiance sensor and a sky-viewing radiance sensor, both mounted on a steerable holder on the bow. A downwelling irradiance sensor was mounted at the top of the ship's meterological mast, on the bow, far from any potentially shading structures. These data were used to estimate normalized water-leaving radiance as a function of wavelength. The radiance detector was set to view the water at 40deg from nadir as recommended by Mueller et al. [2003b]. The water radiance sensor was able to view over an azimuth range of ~180deg across the ship's heading with no viewing of the ship's wake. The direction of the sensor was adjusted to view the water 90-120deg from the sun's azimuth, to minimize sun glint. This was continually adjusted as the time and ship's gyro heading were used to calculate the sun's position using an astronomical solar position subroutine interfaced with a stepping motor which was attached to the radiometer mount (designed and fabricated at Bigelow Laboratory for Ocean Sciences). Protocols for operation and calibration were performed according to Mueller [Mueller et al., 2003a; Mueller et al., 2003b; Mueller et al., 2003c]. Before 1000h and after 1400h, data quality was poorer as the solar zenith angle was too low. Post-cruise, the 10Hz data were filtered to remove as much residual white cap and glint as possible (we accept the lowest 5% of the data). Reflectance plaque measurements were made several times at local apparent noon on sunny days to verify the radiometer calibrations.

    Within an hour of local apparent noon each day, a Satlantic OCP sensor was deployed off the stern of the vessel after the ship oriented so that the sun was off the stern. The ship would secure the starboard Z-drive, and use port Z-drive and bow thruster to move the ship ahead at about 25cm s-1. The OCP was then trailed aft and brought to the surface ~100m aft of the ship, then allowed to sink to 100m as downwelling spectral irradiance and upwelling spectral radiance were recorded continuously along with temperature and salinity. This procedure ensured there were no ship shadow effects in the radiometry.

    Instruments include a WETLabs wetstar fluorometer, a WETLabs ECOTriplet and a SeaBird microTSG.
    Radiometry was done using a Satlantic 7 channel microSAS system with Es, Lt and Li sensors.

    Chl data is based on inter calibrating surface discrete Chlorophyll measure with the temporally closest fluorescence measurement and applying the regression results to all fluorescence data.

    Data have been corrected for instrument biofouling and drift based on weekly purewater calibrations of the system. Radiometric data has been processed using standard Satlantic processing software and has been checked with periodic plaque measurements using a 2% spectralon standard.

    Lw is calculated from Lt and Lsky and is \"what Lt would be if the
    sensor were looking straight down\". Since our sensors are mounted at
    40o, based on various NASA protocols, we need to do that conversion.

    Lwn adds Es to the mix. Es is used to normalize Lw. Nlw is related to Rrs, Remote Sensing Reflectance

    Techniques used are as described in:
    Balch WM, Drapeau DT, Bowler BC, Booth ES, Windecker LA, Ashe A (2008) Space–time variability of carbon standing stocks and fixation rates in the Gulf of Maine, along the GNATS transect between Portland, ME, USA, and Yarmouth, Nova Scotia, Canada.
    J Plankton Res 30:119–139

  17. f

    Data from: Error and anomaly detection for intra-participant time-series...

    • tandf.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    David R. Mullineaux; Gareth Irwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.

  18. (Other) Suburb/Locality Boundaries - Geoscape Administrative Boundaries

    • researchdata.edu.au
    • data.gov.au
    Updated Sep 9, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Industry, Science and Resources (DISR) (2014). (Other) Suburb/Locality Boundaries - Geoscape Administrative Boundaries [Dataset]. https://researchdata.edu.au/other-suburblocality-boundaries-administrative-boundaries/2976691
    Explore at:
    Dataset updated
    Sep 9, 2014
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Department of Industry, Science and Resources (DISR)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The digital Suburb/Locality Boundaries and their legal identifiers have been derived from the cadastre data from each state and territory jurisdiction and are available below.\r \r Suburb/Locality Boundaries are part of Geoscape Administrative Boundaries, which is built and maintained by Geoscape Australia using authoritative government data. Further information about contributors to Administrative Boundaries is available here.\r \r The full Administrative Boundaries dataset comprises seven Geoscape products:\r \r * Localities\r * Local Government Areas (LGAs)\r * Wards\r * Australian Bureau of Statistics (ABS) Boundaries,\r * Electoral Boundaries\r * State Boundaries and\r * Town Points\r \r Updated versions of Administrative Boundaries are published on a quarterly basis.\r Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.\r \r There were no updates in the May 2025 release\r \r \r Notable changes in the August 2021 release:\r \r * The Localities, Local Government Areas and Wards products have been redesigned to provide a new flattened data model, offering a simpler, easier-to-use structure. This will also:\r - change the composition of identifiers in these products.\r - provide state identifiers as an abbreviation (eg. NSW) rather than a code.\r - remove the static SA “Hundreds” data from Localities.\r * More information on the changes to Localities, Local Government Areas and Wards is available here.\r * The Australian Bureau of Statistics (ABS) Boundaries will be updated to include the 2021 Australian Statistical Geography Standard (ASGS).\r * Further information on the August changes to Geoscape datasets is available here.\r \r Further information on Administrative Boundaries, including FAQs on the data, is available here through Geoscape Australia’s network of partners. They provide a range of commercial products based on Administrative Boundaries, including software solutions, consultancy and support.\r \r Note: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia. \r \r

    License Information\r

    \r The Australian Government has negotiated the release of Administrative Boundaries to the whole economy under an open CCBY 4.0 license.\r \r Users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).\r \r Users must also note the following attribution requirements:\r \r Preferred attribution for the Licensed Material:\r \r

    Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).\r \r Preferred attribution for Adapted Material:\r \r Incorporates or developed using Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).\r

  19. d

    Underway Data (SAS) from R/V Roger Revelle KNOX22RR in the Patagonian Shelf...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William M. Balch (2021). Underway Data (SAS) from R/V Roger Revelle KNOX22RR in the Patagonian Shelf (SW South Atlantic) from 2008-2009 (COPAS08 project) [Dataset]. https://search.dataone.org/view/sha256%3Aa1a62a58117682f1e0b0d541e30a6154992cce73db19169271a5f9b09df1ba23
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    William M. Balch
    Description

    Along track temperature, Salinity, backscatter, Chlorophyll Fluoresence, and normalized water leaving radiance (nLw).

    On the bow of the R/V Roger Revelle was a Satlantic SeaWiFS Aircraft Simulator (MicroSAS) system, used to estimate water-leaving radiance from the ship, analogous to to the nLw derived by the SeaWiFS and MODIS satellite sensors, but free from atmospheric error (hence, it can provide data below clouds).

    The system consisted of a down-looking radiance sensor and a sky-viewing radiance sensor, both mounted on a steerable holder on the bow. A downwelling irradiance sensor was mounted at the top of the ship's meterological mast, on the bow, far from any potentially shading structures. These data were used to estimate normalized water-leaving radiance as a function of wavelength. The radiance detector was set to view the water at 40deg from nadir as recommended by Mueller et al. [2003b]. The water radiance sensor was able to view over an azimuth range of ~180deg across the ship's heading with no viewing of the ship's wake. The direction of the sensor was adjusted to view the water 90-120deg from the sun's azimuth, to minimize sun glint. This was continually adjusted as the time and ship's gyro heading were used to calculate the sun's position using an astronomical solar position subroutine interfaced with a stepping motor which was attached to the radiometer mount (designed and fabricated at Bigelow Laboratory for Ocean Sciences). Protocols for operation and calibration were performed according to Mueller [Mueller et al., 2003a; Mueller et al., 2003b; Mueller et al., 2003c]. Before 1000h and after 1400h, data quality was poorer as the solar zenith angle was too low. Post-cruise, the 10Hz data were filtered to remove as much residual white cap and glint as possible (we accept the lowest 5% of the data). Reflectance plaque measurements were made several times at local apparent noon on sunny days to verify the radiometer calibrations.

    Within an hour of local apparent noon each day, a Satlantic OCP sensor was deployed off the stern of the R/V Revelle after the ship oriented so that the sun was off the stern. The ship would secure the starboard Z-drive, and use port Z-drive and bow thruster to move the ship ahead at about 25cm s-1. The OCP was then trailed aft and brought to the surface ~100m aft of the ship, then allowed to sink to 100m as downwelling spectral irradiance and upwelling spectral radiance were recorded continuously along with temperature and salinity. This procedure ensured there were no ship shadow effects in the radiometry.

    Instruments include a WETLabs wetstar fluorometer, a WETLabs ECOTriplet and a SeaBird microTSG.
    Radiometry was done using a Satlantic 7 channel microSAS system with Es, Lt and Li sensors.

    Chl data is based on inter calibrating surface discrete Chlorophyll measure with the temporally closest fluorescence measurement and applying the regression results to all fluorescence data.

    Data have been corrected for instrument biofouling and drift based on weekly purewater calibrations of the system. Radiometric data has been processed using standard Satlantic processing software and has been checked with periodic plaque measurements using a 2% spectralon standard.

    Lw is calculated from Lt and Lsky and is "what Lt would be if the
    sensor were looking straight down". Since our sensors are mounted at
    40o, based on various NASA protocols, we need to do that conversion.

    Lwn adds Es to the mix. Es is used to normalize Lw. Nlw is related to Rrs, Remote Sensing Reflectance

    Techniques used are as described in:
    Balch WM, Drapeau DT, Bowler BC, Booth ES, Windecker LA, Ashe A (2008) Space-time variability of carbon standing stocks and fixation rates in the Gulf of Maine, along the GNATS transect between Portland, ME, USA, and Yarmouth, Nova Scotia, Canada. J Plankton Res 30:119-139

  20. Data from: CellFuse enables multi-modal integration of single-cell and...

    • zenodo.org
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Koladiya; Abhishek Koladiya (2025). CellFuse enables multi-modal integration of single-cell and spatial proteomics data [Dataset]. http://doi.org/10.5281/zenodo.15858358
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhishek Koladiya; Abhishek Koladiya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 19, 2025
    Description

    Fig 2

    Bone marrow (Fig 2B, D, E, F, H, Supplementary Fig 1A, 2,3)

    1. Fig 2/BM/Reference/ Fig2_BM_prepare_data.R: Prepare bone marrow for CellFuse

    2. Fig 2/BM/ BM_CellFuse_Integration.R: Run CellFuse

    3. Fig 2/BM/BM_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

    4. Fig 2/BM/BM_scIB_Benchmarking.ipynb: evaluate performance of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al.

    5. Fig 2/BM/ BM_scIB_prepare_figures.R: Visualize results of scIB framework

    6. Fig 2/BM/Sequential_Feature_drop/Prepare_data.R: Prepare data for evaluating sequential feature drop

    7. Fig 2/BM/Sequential_Feature_drop/Run_methods.R: Run CellFuse, Harmony, Seurat and FastMNN for sequential feature drop

    8. Fig 2/BM/Sequential_Feature_drop/Evaluate_results.R: Evaluate results features drop and visualize data.

    PBMC (Fig 2G,I, Supplementary Fig 1B and 4)

    1. Fig 2/PBMC/Reference/ Fig2_PBMC_prepare_data.R: Prepare PBMC data for CellFuse

    2. Fig 2/ PBMC / PBMC_CellFuse_Integration.R: Run CellFuse

    3. Fig 2/ PBMC /PBMC_Running_Benchmark_Methods.R: Run benchmarking methods (Harmony, Seurat, FastMNN)

    4. Fig 2/ PBMC /PBMC_scIB_Benchmarking.ipynb: evaluate performace of CellFuse and other benchmarking methods using scIB framework proposed by Luecken et al., 2021

    5. Fig 2/ PBMC /PBMC_scIB_prepare_figures.R: Visualize results of scIB framework

    6. Fig 2/ PBMC/ RunTime_benchmark/Run_Benchmark.R: Prepare data, run benchmarking method and evaluate results.

    Fig 3 and Supplementary Fig 5

    1. Fig 3/Reference/ Fig3_CyTOF_prepare_data.R: Prepare CyTOF and CITE-Seq data for CellFuse

    2. Fig 3/CellFuse_Integration_CyTOF.R: Run CellFuse to remove batch effect and integrate CyTOF data from day 7 post-infusion

    3. Fig 3/CellFuse_Integration_CITESeq.R: Run CellFuse to integrate CyTOF and CITE-Seq data

    4. Fig 3/CART_Data_visualisation.R: Visualize data

    Fig 4

    HuBMAP CODEX data (Fig. 4A, B, C, D and Supplementary Fig 6)

    1. Fig 4/CODEX_colorectal/Reference/ CODEX_HuBMAP_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

    2. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_CellFuse_Predict.R: Run CellFuse on cells from from annotated and unannotated donor

    3. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Data_visualisation.R: Visualize data and prepare figures.

    4. Fig 4/ CODEX_colorectal/ CODEX_HuBMAP_Benchmark.R: Benchmarking CellFuse against CELESTA, SVM and Seurat using cells from annotated donors and prepare figures.

    a. Astir is python package so run following python notebook: Fig 4/ CODEX_colorectal/ Benchmarking/Astir/Astrir.ipynb

    5. Fig 4/ CODEX_colorectal/CODEX_HuBMAP_Suppl_figure_heatmap.R: F1score calculation per celltype per Benchmarking methods and heatmap comparing celltypes from annotated and unannotated donors (Supplementary Fig 6)

    IMC Breast cancer data (Fig. 4E,F, G and Supplementary Fig 7)

    1. Fig 4/ IMC_Breast_Cancer/ IMC_prepare_data.R: Prepare CODEX data from annotated and unannotated donor

    2. Fig 4/ IMC_Breast_Cancer/ IMC_CellFuse_Predict.R: Run CellFuse to predict cell types

    3. Fig 4/ IMC_Breast_Cancer/ IMC_dat_visualization.R: Visualize data and prepare figures.

    Fig 5

    1. Fig5/ Reference/ Fig5_CyTOF_Data_prep.R: Prepare CyTOF data from healthy PBMC and healthy colon single cells

    2. Fig5/ MIBI_CellFuse_Predict.R: Run CellFuse to predicte cells from colon cancer patients

    3. Fig5/ MIBI_PostPrediction.R: Visualize data and prepare figures

    4. Fig5/ Predicted_Data/ mask_generation.ipynb: Post CellFuse prediction annotated cell types in segmented images. This will generate Fig5C and D

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
Organization logo

R code

Explore at:
txtAvailable download formats
Dataset updated
Jun 5, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Christine Dodge
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

Search
Clear search
Close search
Google apps
Main menu