100+ datasets found
  1. d

    Python and R Basics for Environmental Data Sciences

    • search.dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Tao Wen
    Area covered
    Description

    This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

    This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

    This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.

  2. a

    UCF Google Street View Dataset 2014

    • academictorrents.com
    bittorrent
    Updated Apr 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir R. Zamir and Mubarak Shah (2019). UCF Google Street View Dataset 2014 [Dataset]. https://academictorrents.com/details/e52a8978af7c2f734f2b30795075dbcd50efc983
    Explore at:
    bittorrent(46247776646)Available download formats
    Dataset updated
    Apr 10, 2019
    Dataset authored and provided by
    Amir R. Zamir and Mubarak Shah
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    The dataset contains 62,058 high quality Google Street View images. The images cover the downtown and neighboring areas of Pittsburgh, PA; Orlando, FL and partially Manhattan, NY. Accurate GPS coordinates of the images and their compass direction are provided as well. For each Street View placemark (i.e. each spot on one street), the 360° spherical view is broken down into 4 side views and 1 upward view. There is one additional image per placemark which shows some overlaid markers, such as the address, name of streets, etc. ### Citation: Please cite the following paper for which this data was collected (partially): Image Geo-localization based on Multiple Nearest Neighbor Feature Matching using Generalized Graphs. Amir Roshan Zamir and Mubarak Shah. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014.

  3. H

    Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021)

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meghan Robinson; W. Adam Sigler; Mike Koopal (2023). Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021) [Dataset]. http://doi.org/10.4211/hs.5ca7307fda8949299e6782885da95046
    Explore at:
    zip(219.0 MB)Available download formats
    Dataset updated
    Feb 28, 2023
    Dataset provided by
    HydroShare
    Authors
    Meghan Robinson; W. Adam Sigler; Mike Koopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 27, 2007 - Nov 3, 2021
    Area covered
    Description

    This resource contains data collected by the Whitefish Lake Institute (WLI) as well as R code used to compile and conduct quality assurance on the data. This resource reflects joint publication efforts between WLI and the Montana State University Extension Water Quality (MSUEWQ) program. All data included here was uploaded to the National Water Quality Portal (WQX) in 2022. It is the intention of WLI to upload all future data to WQX and this HydroShare resource may also be updated in the future with data for 2022 and forward.

    Data Purpose: The ‘Data’ folder of this resource holds the final data products for the extensive dataset collected by WLI between 2007 and 2021. This folder is likely of interest to users who want data for research and analysis purposes. This dataset contains physical water parameter field data collected by Hydrolab MS5 and DS5 loggers, including water temperature, specific conductance, dissolved oxygen concentration and saturation, barometric pressure, and turbidity. Additional field data that needs further quality assurance prior to use includes chlorophyll a, ORP, pH, and PAR. This dataset also contains water chemistry data analyzed at certified laboratories including total nitrogen, total phosphorus, nitrate, orthophosphate, total suspended solids, organic carbon, and chlorophyll a. The data folder includes R scripts with code for examples of data visualization. This dataset can provide insight to water quality trends in lakes and streams of northwestern Montana over time. Data Summary: During the time-period, WLI collected water quality data for 63 lake sites and 17 stream and river sites in northwestern Montana under two separate monitoring projects. The Northwest Montana Lakes Network (NMLN) project currently visits 41 lake sites in Northwestern Montana once per summer. Field data from Hydrolabs are collected at discrete depths throughout a lake's profile, and depth integrated water chemistry samples are collected as well. The Whitefish Water Quality Monitoring Project (WWQMP) currently visits two sites on Whitefish Lake, one site on Tally Lake, and 11 stream and river sites in the Whitefish Lake and Upper Whitefish River watersheds monthly between April and November. Field data is collected at one depth for streams and many depths throughout the lake profiles, and water chemistry samples are collected at discrete depths for Whitefish Lake and streams. The final dataset for both programs includes over 112,000 datapoints of data passing quality assurance assessment and an additional 72,000 datapoints that would need further quality assurance before use.

    Workflow Purpose: The ‘Workflow’ folder of this resource contains the raw data, folder structure, and R code used during this data compilation and upload process. This folder is likely of interest to users who have similar datasets and are interested in code for automating data compilation or upload processes. The R scripts included here have code to stitch together many individual Hydrolab MS5 and DS5 logger files as well as lab electronic data deliverables (EDDs), which may be useful for users who are interested in compiling one or multiple seasons' worth of data into a single file. Reformatting scripts format data to match the multi-sheet excel workbook format required by the Montana Department of Environmental Quality for uploads to WQX, and may be useful to others hoping to automate database uploads. Workflow Summary: Compilation code in the workflow folder compiles data from its most original forms, including Hydrolab sonde export files and lab EDDs. This compilation process includes extracting dates and times from comment fields and producing a single file from many input files. Formatting code then reformats the data to match WQX upload requirements, which includes generating unique activity IDs for data collected at the same site, date, and time then linking these activity IDs with results across worksheets in an excel workbook. Code for generating all quality assurance figures used in the decision-making process outlined in the Quality Assurance Document and resulting data removal decisions are included here as well. Finally, this folder includes code for combining data from the separate program uploads for WQX to the more user-friendly structure for analysis provided in the 'Data' file for this HydroShare resource.

  4. q

    Large Datasets in R - Plant Phenology & Temperature Data from NEON

    • qubeshub.org
    Updated May 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan Jones Patterson; Lee Stanish; Natalie Robinson; Katherine Jones; Cody Flagg (2018). Large Datasets in R - Plant Phenology & Temperature Data from NEON [Dataset]. http://doi.org/10.25334/Q4DQ3F
    Explore at:
    Dataset updated
    May 10, 2018
    Dataset provided by
    QUBES
    Authors
    Megan Jones Patterson; Lee Stanish; Natalie Robinson; Katherine Jones; Cody Flagg
    Description

    This module series covers how to import, manipulate, format and plot time series data stored in .csv format in R. Originally designed to teach researchers to use NEON plant phenology and air temperature data; has been used in undergraduate classrooms.

  5. p

    data_neo.Rdata

    • psycharchives.org
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). data_neo.Rdata [Dataset]. https://psycharchives.org/handle/20.500.12034/4717
    Explore at:
    Dataset updated
    Dec 20, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R is a very powerful language for statistical computing in many disciplines of research and has a steep learning curve. The software is open source, freely available and has a thriving community. This crash course provides an overview of Base-R concepts for beginners and covers the topics 1) introduction into R, 2) reading, saving, and viewing data, 3) selecting and changing objects in R, and 4) descriptive statistics.This course was held by Lisa Spitzer on September 3, 2021, as a precursor to the R tidyverse Workshop by Aurélien Ginolhac and Roland Krause (September 8 - 10, 2021). This entry features the slides, exercises/results, and chat messages of the crash course. Related to this entry are the recordings of the course, and the r tidyverse workshop materials. Click on "related PsychArchives objects" to view or download the recordings of the workshop.:

  6. John r krollpfeiffer USA Import & Buyer Data

    • seair.co.in
    Updated Dec 2, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2014). John r krollpfeiffer USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 2, 2014
    Dataset provided by
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  7. f

    Clustering of samples and variables with mixed-type data

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuela Hummel; Dominic Edelmann; Annette Kopp-Schneider (2023). Clustering of samples and variables with mixed-type data [Dataset]. http://doi.org/10.1371/journal.pone.0188274
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Manuela Hummel; Dominic Edelmann; Annette Kopp-Schneider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.

  8. d

    Replication Data for: \"A Topic-based Segmentation Model for Identifying...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
    Description

    We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...

  9. m

    Raw Navigation Data acquired during R/V Falkor (too) expedition FKt250220...

    • marine-geo.org
    Updated Mar 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MGDS > Marine Geoscience Data System (2025). Raw Navigation Data acquired during R/V Falkor (too) expedition FKt250220 (2025) [Dataset]. http://doi.org/10.60521/332371
    Explore at:
    Dataset updated
    Mar 21, 2025
    Dataset authored and provided by
    MGDS > Marine Geoscience Data System
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Time period covered
    Feb 28, 2025 - Mar 20, 2025
    Area covered
    Description

    This data set was acquired with a Navigation System on ROV SuBastian during R/V Falkor (too) expedition FKt250220 conducted in 2025 (Chief Scientist: Dr. Michelle Taylor). These data files are of Text File (ASCII) format and include Navigation data that have not been processed.

  10. e

    Estimate of total Alaskan salmon abundance by region, 2000-2015

    • knb.ecoinformatics.org
    • search-dev.test.dataone.org
    • +3more
    Updated Aug 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeanette Clark; Robyn Thiessen-Bock (2021). Estimate of total Alaskan salmon abundance by region, 2000-2015 [Dataset]. http://doi.org/10.5063/F1BR8QG4
    Explore at:
    Dataset updated
    Aug 13, 2021
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    Jeanette Clark; Robyn Thiessen-Bock
    Time period covered
    Jan 1, 2000 - Jan 1, 2015
    Area covered
    Variables measured
    species, SASAP.Region, meanCumAnnualCount
    Description

    This dataset compiles salmon escapement data from Alaska Department of Fish and Game reports and salmon harvest data from commercial, personal use, sport fish, and subsistence sectors to generate an estimate of total salmon abundance in each of the regions defined by the State of Alaska's Salmon and People Project (SASAP). This dataset was assembled to enable a broad view of the salmon resource, whether biological (escapement) or economic/cultural (harvest), across regions. With that intent in mind, a fish is counted within a region if it escaped a fishery and was counted in a river that is contained within that region, or if it was caught in a fishery that is within the region. For commercial fisheries, each commercial fishing district was assigned to a region based on the location of the commercial fishing district relative to the bounding watersheds of the region. No effort was made to determine the region of origin for any commercially caught fish - thus for some regions, fish caught in one region may have been headed to spawn in another. This is especially true of Alaska Peninsula and Aleutian Islands commercial fishing areas, which are well known for having large amounts of Bristol Bay bound fish. Note that some regions have missing escapement data during some years. This dataset includes an R Markdown file which processes the original data and creates figures, the rendered html file generated from running the R Markdown file which includes many figures and data explanation, and several standalone versions of those figures. Data sources: Jeanette Clark and Alaska Department of Fish and Game, Division of Commercial Fisheries Alaska Department of Fish and Game, Division of Sport Fish Alaska Department of Fish and Game, Division of Subsistence. Harvest of Salmon across Commercial, Subsistence, Personal Use, and Sport Fish sectors, Alaska, 1995-2016. Knowledge Network for Biocomplexity. doi:10.5063/F1TT4P73 Andrew Munro and Eric Volk. 2018. Summary of Pacific Salmon Escapement Goals in Alaska with a Review of Escapements from 2001 to 2009. Knowledge Network for Biocomplexity. doi:10.5063/F1416VB4 Andrew Munro and Eric Volk. 2017. Summary of Pacific Salmon Escapement Goals in Alaska with a Review of Escapements from 2007 to 2015. Knowledge Network for Biocomplexity. doi:10.5063/F1GX48V4 James Savereide. 2017. Estimated annual Chinook Salmon escapement at Copper River from 1980 to 2016. Knowledge Network for Biocomplexity. doi:10.5063/F1G15Z4D.

  11. R k builders hawaii USA Import & Buyer Data

    • seair.co.in
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, R k builders hawaii USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  12. Z

    RailEnV-PASMVS: a dataset for multi-view stereopsis training and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petrus Johannes Gräbe (2024). RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5202742
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    André Broekman
    Petrus Johannes Gräbe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering.

    File descriptions

    RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures.

    RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths.

    geometry.zip (2 Mb) - Geometry CSV files used for scenes 01 to 20. The Bezier curve defines the geometry of the rail profile (10 mm intervals).

    PhysicalDataset.7z (2.0 Gb) - A smaller, secondary dataset of 320 manually annotated photographs of railway environments; only the railway profiles are annotated.

    01.7z-20.7z (2.0 Gb each) - Archive of each scene (01 through 20).

    all_list.txt, training_list.txt, validation_list.txt - Text files containing the all the scene names, together with those used for validation (validation_list.txt) and training (training_list.txt), used by MVSNet

    index.csv - CSV file provides a convenient reference for all the sample files, linking the corresponding file and relative data path.

    NOTE: Only 20 of the original 40 scenes are made available owing to size limitations of the data repository. This is still adequate for the purposes of training MVS neural networks. The Blender file is made available specifically to render out the scenes for different applications or adapt the camera framework altogether for different applications. Please refer to the corresponding manuscript for additional details.

    Steps to reproduce

    The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices.

    import numpy as np from scipy.spatial.transform import Rotation as R

    The intrinsic matrix K is constructed using the following formulation:

    focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm K = [[focalLengthPixel, 0, dimX/2], [0, focalPixel, dimY/2], [0, 0, 1]]

    The rotation vector as provided by Blender was first transformed to a rotation matrix:

    r = R.from_euler('xyz', vectR, degrees=True) matR = r.as_matrix()

    Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system:

    R_world2bcam = np.transpose(matR)

    The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is:

    R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]])

    Thus the representation from WORLD to CV/STANDARD coordinates is:

    R_world2cv = R_bcam2cv.dot(R_world2bcam)

    The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates:

    T_world2bcam = -1 * R_world2bcam.dot(vectC) T_world2cv = R_bcam2cv.dot(T_world2bcam)

    The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided.

    Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.

  13. f

    Data from: Functional Additive Mixed Models

    • tandf.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Scheipl; Ana-Maria Staicu; Sonja Greven (2023). Functional Additive Mixed Models [Dataset]. http://doi.org/10.6084/m9.figshare.987098.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Fabian Scheipl; Ana-Maria Staicu; Sonja Greven
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well, and also scales to larger datasets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.

  14. N

    r ch

    • data.cityofnewyork.us
    • data.wu.ac.at
    application/rdfxml +5
    Updated Jul 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taxi and Limousine Commission (TLC) (2025). r ch [Dataset]. https://data.cityofnewyork.us/Transportation/r-ch/btu6-zcrd
    Explore at:
    application/rdfxml, xml, json, csv, application/rssxml, tsvAvailable download formats
    Dataset updated
    Jul 14, 2025
    Authors
    Taxi and Limousine Commission (TLC)
    Description
  15. Howerton justin r USA Import & Buyer Data

    • seair.co.in
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Howerton justin r USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  16. Plocamium reproductive system data and R code

    • usap-dc.org
    • search.dataone.org
    html, xml
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amsler, Charles (2022). Plocamium reproductive system data and R code [Dataset]. http://doi.org/10.15784/601622
    Explore at:
    xml, htmlAvailable download formats
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    United States Antarctic Programhttp://www.usap.gov/
    Authors
    Amsler, Charles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Data and R code from Sabrina Heiser's study of the reproductive system of Plocamium sp. in the Palmer Station region.

  17. S r burzynski clinic USA Import & Buyer Data

    • seair.co.in
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, S r burzynski clinic USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  18. Matthew r quillen USA Import & Buyer Data

    • seair.co.in
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Matthew r quillen USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  19. CTD Data Acquired by R/V Xue Long in the Prydz Bay- Amery Ice Shelf Region,...

    • usap-dc.org
    • get.iedadata.org
    • +1more
    html, xml
    Updated May 2, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan, Xiaojun (2016). CTD Data Acquired by R/V Xue Long in the Prydz Bay- Amery Ice Shelf Region, 2015-2017 [Dataset]. http://doi.org/10.15784/600174
    Explore at:
    html, xmlAvailable download formats
    Dataset updated
    May 2, 2016
    Dataset provided by
    United States Antarctic Programhttp://www.usap.gov/
    Authors
    Yuan, Xiaojun
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset contains inventories and location maps for CTD data acquired by the icebreaker R/V Xue Long in the Prydz Bay- Amery Ice Shelf region. A total of 68 stations were acquired in February 2015 and 24 stations in March 2017, as part of a joint US/China project to study Antarctic Bottom Water (AABW) formation.

  20. Z

    ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

    • data.niaid.nih.gov
    • elki-project.github.io
    • +2more
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Schubert, Erich
    Zimek, Arthur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data sets were originally created for the following publications:

    M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

    H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

    The outlier data set versions were introduced in:

    E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

    They are derived from the original image data available at https://aloi.science.uva.nl/

    The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

    Additional information is available at: https://elki-project.github.io/datasets/multi_view

    The following views are currently available:

        Feature type
        Description
        Files
    
    
        Object number
        Sparse 1000 dimensional vectors that give the true object assignment
        objs.arff.gz
    
    
        RGB color histograms
        Standard RGB color histograms (uniform binning)
        aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
    
    
        HSV color histograms
        Standard HSV/HSB color histograms in various binnings
        aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
    
    
        Color similiarity
        Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)
        aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
    
    
        Haralick features
        First 13 Haralick features (radius 1 pixel)
        aloi-haralick-1.csv.gz
    
    
        Front to back
        Vectors representing front face vs. back faces of individual objects
        front.arff.gz
    
    
        Basic light
        Vectors indicating basic light situations
        light.arff.gz
    
    
        Manual annotations
        Manually annotated object groups of semantically related objects such as cups
        manual1.arff.gz
    

    Outlier Detection Versions

    Additionally, we generated a number of subsets for outlier detection:

        Feature type
        Description
        Files
    
    
        RGB Histograms
        Downsampled to 100000 objects (553 outliers)
        aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
    
    
    
        Downsampled to 75000 objects (717 outliers)
        aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
    
    
    
        Downsampled to 50000 objects (1508 outliers)
        aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff

Python and R Basics for Environmental Data Sciences

Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Tao Wen
Area covered
Description

This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.

Search
Clear search
Close search
Google apps
Main menu