100+ datasets found

d
Python and R Basics for Environmental Data Sciences
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Tao Wen
Area covered

Description
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
a
UCF Google Street View Dataset 2014
academictorrents.com
bittorrent
Updated Apr 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir R. Zamir and Mubarak Shah (2019). UCF Google Street View Dataset 2014 [Dataset]. https://academictorrents.com/details/e52a8978af7c2f734f2b30795075dbcd50efc983
Explore at:
bittorrent(46247776646)Available download formats
Dataset updated
Apr 10, 2019
Dataset authored and provided by
Amir R. Zamir and Mubarak Shah
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The dataset contains 62,058 high quality Google Street View images. The images cover the downtown and neighboring areas of Pittsburgh, PA; Orlando, FL and partially Manhattan, NY. Accurate GPS coordinates of the images and their compass direction are provided as well. For each Street View placemark (i.e. each spot on one street), the 360° spherical view is broken down into 4 side views and 1 upward view. There is one additional image per placemark which shows some overlaid markers, such as the address, name of streets, etc. ### Citation: Please cite the following paper for which this data was collected (partially): Image Geo-localization based on Multiple Nearest Neighbor Feature Matching using Generalized Graphs. Amir Roshan Zamir and Mubarak Shah. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014.
H
Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021)
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meghan Robinson; W. Adam Sigler; Mike Koopal (2023). Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021) [Dataset]. http://doi.org/10.4211/hs.5ca7307fda8949299e6782885da95046
Explore at:
zip(219.0 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.5ca7307fda8949299e6782885da95046
Dataset updated
Feb 28, 2023
Dataset provided by
HydroShare
Authors
Meghan Robinson; W. Adam Sigler; Mike Koopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 27, 2007 - Nov 3, 2021
Area covered

Description
This resource contains data collected by the Whitefish Lake Institute (WLI) as well as R code used to compile and conduct quality assurance on the data. This resource reflects joint publication efforts between WLI and the Montana State University Extension Water Quality (MSUEWQ) program. All data included here was uploaded to the National Water Quality Portal (WQX) in 2022. It is the intention of WLI to upload all future data to WQX and this HydroShare resource may also be updated in the future with data for 2022 and forward.

Data Purpose: The ‘Data’ folder of this resource holds the final data products for the extensive dataset collected by WLI between 2007 and 2021. This folder is likely of interest to users who want data for research and analysis purposes. This dataset contains physical water parameter field data collected by Hydrolab MS5 and DS5 loggers, including water temperature, specific conductance, dissolved oxygen concentration and saturation, barometric pressure, and turbidity. Additional field data that needs further quality assurance prior to use includes chlorophyll a, ORP, pH, and PAR. This dataset also contains water chemistry data analyzed at certified laboratories including total nitrogen, total phosphorus, nitrate, orthophosphate, total suspended solids, organic carbon, and chlorophyll a. The data folder includes R scripts with code for examples of data visualization. This dataset can provide insight to water quality trends in lakes and streams of northwestern Montana over time. Data Summary: During the time-period, WLI collected water quality data for 63 lake sites and 17 stream and river sites in northwestern Montana under two separate monitoring projects. The Northwest Montana Lakes Network (NMLN) project currently visits 41 lake sites in Northwestern Montana once per summer. Field data from Hydrolabs are collected at discrete depths throughout a lake's profile, and depth integrated water chemistry samples are collected as well. The Whitefish Water Quality Monitoring Project (WWQMP) currently visits two sites on Whitefish Lake, one site on Tally Lake, and 11 stream and river sites in the Whitefish Lake and Upper Whitefish River watersheds monthly between April and November. Field data is collected at one depth for streams and many depths throughout the lake profiles, and water chemistry samples are collected at discrete depths for Whitefish Lake and streams. The final dataset for both programs includes over 112,000 datapoints of data passing quality assurance assessment and an additional 72,000 datapoints that would need further quality assurance before use.

Workflow Purpose: The ‘Workflow’ folder of this resource contains the raw data, folder structure, and R code used during this data compilation and upload process. This folder is likely of interest to users who have similar datasets and are interested in code for automating data compilation or upload processes. The R scripts included here have code to stitch together many individual Hydrolab MS5 and DS5 logger files as well as lab electronic data deliverables (EDDs), which may be useful for users who are interested in compiling one or multiple seasons' worth of data into a single file. Reformatting scripts format data to match the multi-sheet excel workbook format required by the Montana Department of Environmental Quality for uploads to WQX, and may be useful to others hoping to automate database uploads. Workflow Summary: Compilation code in the workflow folder compiles data from its most original forms, including Hydrolab sonde export files and lab EDDs. This compilation process includes extracting dates and times from comment fields and producing a single file from many input files. Formatting code then reformats the data to match WQX upload requirements, which includes generating unique activity IDs for data collected at the same site, date, and time then linking these activity IDs with results across worksheets in an excel workbook. Code for generating all quality assurance figures used in the decision-making process outlined in the Quality Assurance Document and resulting data removal decisions are included here as well. Finally, this folder includes code for combining data from the separate program uploads for WQX to the more user-friendly structure for analysis provided in the 'Data' file for this HydroShare resource.
q
Large Datasets in R - Plant Phenology & Temperature Data from NEON
qubeshub.org
Updated May 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan Jones Patterson; Lee Stanish; Natalie Robinson; Katherine Jones; Cody Flagg (2018). Large Datasets in R - Plant Phenology & Temperature Data from NEON [Dataset]. http://doi.org/10.25334/Q4DQ3F
Explore at:
Unique identifier
https://doi.org/10.25334/Q4DQ3F
Dataset updated
May 10, 2018
Dataset provided by
QUBES
Authors
Megan Jones Patterson; Lee Stanish; Natalie Robinson; Katherine Jones; Cody Flagg
Description
This module series covers how to import, manipulate, format and plot time series data stored in .csv format in R. Originally designed to teach researchers to use NEON plant phenology and air temperature data; has been used in undergraduate classrooms.
p
data_neo.Rdata
psycharchives.org
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). data_neo.Rdata [Dataset]. https://psycharchives.org/handle/20.500.12034/4717
Explore at:
Dataset updated
Dec 20, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R is a very powerful language for statistical computing in many disciplines of research and has a steep learning curve. The software is open source, freely available and has a thriving community. This crash course provides an overview of Base-R concepts for beginners and covers the topics 1) introduction into R, 2) reading, saving, and viewing data, 3) selecting and changing objects in R, and 4) descriptive statistics.This course was held by Lisa Spitzer on September 3, 2021, as a precursor to the R tidyverse Workshop by Aurélien Ginolhac and Roland Krause (September 8 - 10, 2021). This entry features the slides, exercises/results, and chat messages of the crash course. Related to this entry are the recordings of the course, and the r tidyverse workshop materials. Click on "related PsychArchives objects" to view or download the recordings of the workshop.:
John r krollpfeiffer USA Import & Buyer Data
seair.co.in
Updated Dec 2, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2014). John r krollpfeiffer USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Dec 2, 2014
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
f
Clustering of samples and variables with mixed-type data
plos.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuela Hummel; Dominic Edelmann; Annette Kopp-Schneider (2023). Clustering of samples and variables with mixed-type data [Dataset]. http://doi.org/10.1371/journal.pone.0188274
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0188274
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Manuela Hummel; Dominic Edelmann; Annette Kopp-Schneider
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.
d
Replication Data for: \"A Topic-based Segmentation Model for Identifying...
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EE3DE2
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
Description
We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...
m
Raw Navigation Data acquired during R/V Falkor (too) expedition FKt250220...
marine-geo.org
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MGDS > Marine Geoscience Data System (2025). Raw Navigation Data acquired during R/V Falkor (too) expedition FKt250220 (2025) [Dataset]. http://doi.org/10.60521/332371
Explore at:
Unique identifier
https://doi.org/10.60521/332371
Dataset updated
Mar 21, 2025
Dataset authored and provided by
MGDS > Marine Geoscience Data System
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Time period covered
Feb 28, 2025 - Mar 20, 2025
Area covered

Description
This data set was acquired with a Navigation System on ROV SuBastian during R/V Falkor (too) expedition FKt250220 conducted in 2025 (Chief Scientist: Dr. Michelle Taylor). These data files are of Text File (ASCII) format and include Navigation data that have not been processed.
e
Estimate of total Alaskan salmon abundance by region, 2000-2015
knb.ecoinformatics.org
search-dev.test.dataone.org
+3more
Updated Aug 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeanette Clark; Robyn Thiessen-Bock (2021). Estimate of total Alaskan salmon abundance by region, 2000-2015 [Dataset]. http://doi.org/10.5063/F1BR8QG4
Explore at:
Unique identifier
https://doi.org/10.5063/F1BR8QG4
Dataset updated
Aug 13, 2021
Dataset provided by
Knowledge Network for Biocomplexity
Authors
Jeanette Clark; Robyn Thiessen-Bock
Time period covered
Jan 1, 2000 - Jan 1, 2015
Area covered
Variables measured
species, SASAP.Region, meanCumAnnualCount
Description
This dataset compiles salmon escapement data from Alaska Department of Fish and Game reports and salmon harvest data from commercial, personal use, sport fish, and subsistence sectors to generate an estimate of total salmon abundance in each of the regions defined by the State of Alaska's Salmon and People Project (SASAP). This dataset was assembled to enable a broad view of the salmon resource, whether biological (escapement) or economic/cultural (harvest), across regions. With that intent in mind, a fish is counted within a region if it escaped a fishery and was counted in a river that is contained within that region, or if it was caught in a fishery that is within the region. For commercial fisheries, each commercial fishing district was assigned to a region based on the location of the commercial fishing district relative to the bounding watersheds of the region. No effort was made to determine the region of origin for any commercially caught fish - thus for some regions, fish caught in one region may have been headed to spawn in another. This is especially true of Alaska Peninsula and Aleutian Islands commercial fishing areas, which are well known for having large amounts of Bristol Bay bound fish. Note that some regions have missing escapement data during some years. This dataset includes an R Markdown file which processes the original data and creates figures, the rendered html file generated from running the R Markdown file which includes many figures and data explanation, and several standalone versions of those figures. Data sources: Jeanette Clark and Alaska Department of Fish and Game, Division of Commercial Fisheries Alaska Department of Fish and Game, Division of Sport Fish Alaska Department of Fish and Game, Division of Subsistence. Harvest of Salmon across Commercial, Subsistence, Personal Use, and Sport Fish sectors, Alaska, 1995-2016. Knowledge Network for Biocomplexity. doi:10.5063/F1TT4P73 Andrew Munro and Eric Volk. 2018. Summary of Pacific Salmon Escapement Goals in Alaska with a Review of Escapements from 2001 to 2009. Knowledge Network for Biocomplexity. doi:10.5063/F1416VB4 Andrew Munro and Eric Volk. 2017. Summary of Pacific Salmon Escapement Goals in Alaska with a Review of Escapements from 2007 to 2015. Knowledge Network for Biocomplexity. doi:10.5063/F1GX48V4 James Savereide. 2017. Estimated annual Chinook Salmon escapement at Copper River from 1980 to 2016. Knowledge Network for Biocomplexity. doi:10.5063/F1G15Z4D.
R k builders hawaii USA Import & Buyer Data
seair.co.in
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, R k builders hawaii USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Z
RailEnV-PASMVS: a dataset for multi-view stereopsis training and...
data.niaid.nih.gov
zenodo.org
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Petrus Johannes Gräbe (2024). RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5202742
Explore at:
Dataset updated
Jul 18, 2024
Dataset provided by
André Broekman
Petrus Johannes Gräbe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering.

File descriptions

RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures.

RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths.

geometry.zip (2 Mb) - Geometry CSV files used for scenes 01 to 20. The Bezier curve defines the geometry of the rail profile (10 mm intervals).

PhysicalDataset.7z (2.0 Gb) - A smaller, secondary dataset of 320 manually annotated photographs of railway environments; only the railway profiles are annotated.

01.7z-20.7z (2.0 Gb each) - Archive of each scene (01 through 20).

all_list.txt, training_list.txt, validation_list.txt - Text files containing the all the scene names, together with those used for validation (validation_list.txt) and training (training_list.txt), used by MVSNet

index.csv - CSV file provides a convenient reference for all the sample files, linking the corresponding file and relative data path.

NOTE: Only 20 of the original 40 scenes are made available owing to size limitations of the data repository. This is still adequate for the purposes of training MVS neural networks. The Blender file is made available specifically to render out the scenes for different applications or adapt the camera framework altogether for different applications. Please refer to the corresponding manuscript for additional details.

Steps to reproduce

The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices.

import numpy as np from scipy.spatial.transform import Rotation as R

The intrinsic matrix K is constructed using the following formulation:

focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm K = [[focalLengthPixel, 0, dimX/2], [0, focalPixel, dimY/2], [0, 0, 1]]

The rotation vector as provided by Blender was first transformed to a rotation matrix:

r = R.from_euler('xyz', vectR, degrees=True) matR = r.as_matrix()

Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system:

R_world2bcam = np.transpose(matR)

The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is:

R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]])

Thus the representation from WORLD to CV/STANDARD coordinates is:

R_world2cv = R_bcam2cv.dot(R_world2bcam)

The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates:

T_world2bcam = -1 * R_world2bcam.dot(vectC) T_world2cv = R_bcam2cv.dot(T_world2bcam)

The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided.

Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.
f
Data from: Functional Additive Mixed Models
tandf.figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Scheipl; Ana-Maria Staicu; Sonja Greven (2023). Functional Additive Mixed Models [Dataset]. http://doi.org/10.6084/m9.figshare.987098.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.987098.v2
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
Fabian Scheipl; Ana-Maria Staicu; Sonja Greven
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well, and also scales to larger datasets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.
N
r ch
data.cityofnewyork.us
data.wu.ac.at
application/rdfxml +5
Updated Jul 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taxi and Limousine Commission (TLC) (2025). r ch [Dataset]. https://data.cityofnewyork.us/Transportation/r-ch/btu6-zcrd
Explore at:
application/rdfxml, xml, json, csv, application/rssxml, tsvAvailable download formats
Dataset updated
Jul 14, 2025
Authors
Taxi and Limousine Commission (TLC)
Description
New driver application status check. For historical data, please see- https://data.cityofnewyork.us/Transportation/Historical-Driver-Application-Status/p32s-yqxq

Data dictionary available here: https://data.cityofnewyork.us/api/views/dpec-ucu7/files/6cd40752-22c4-4c56-ba39-dfa51cc6e14c?download=true&filename=NewDriverAppStatusLookupLegend.pdf
Howerton justin r USA Import & Buyer Data
seair.co.in
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Howerton justin r USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Plocamium reproductive system data and R code
usap-dc.org
search.dataone.org
html, xml
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amsler, Charles (2022). Plocamium reproductive system data and R code [Dataset]. http://doi.org/10.15784/601622
Explore at:
xml, htmlAvailable download formats
Unique identifier
https://doi.org/10.15784/601622
Dataset updated
Nov 21, 2022
Dataset provided by
United States Antarctic Programhttp://www.usap.gov/
Authors
Amsler, Charles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Data and R code from Sabrina Heiser's study of the reproductive system of Plocamium sp. in the Palmer Station region.
S r burzynski clinic USA Import & Buyer Data
seair.co.in
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, S r burzynski clinic USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Matthew r quillen USA Import & Buyer Data
seair.co.in
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Matthew r quillen USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
CTD Data Acquired by R/V Xue Long in the Prydz Bay- Amery Ice Shelf Region,...
usap-dc.org
get.iedadata.org
+1more
html, xml
Updated May 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuan, Xiaojun (2016). CTD Data Acquired by R/V Xue Long in the Prydz Bay- Amery Ice Shelf Region, 2015-2017 [Dataset]. http://doi.org/10.15784/600174
Explore at:
html, xmlAvailable download formats
Unique identifier
https://doi.org/10.15784/600174
Dataset updated
May 2, 2016
Dataset provided by
United States Antarctic Programhttp://www.usap.gov/
Authors
Yuan, Xiaojun
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered

Description
This dataset contains inventories and location maps for CTD data acquired by the icebreaker R/V Xue Long in the Prydz Bay- Amery Ice Shelf region. A total of 68 stations were acquired in February 2015 and 24 stations in March 2017, as part of a joint US/China project to study Antarctic Bottom Water (AABW) formation.
Z
ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...
data.niaid.nih.gov
elki-project.github.io
+2more
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
Explore at:
Dataset updated
May 2, 2024
Dataset provided by
Schubert, Erich
Zimek, Arthur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data sets were originally created for the following publications:

M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

The outlier data set versions were introduced in:

E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

They are derived from the original image data available at https://aloi.science.uva.nl/

The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

Additional information is available at: https://elki-project.github.io/datasets/multi_view

The following views are currently available:

Feature type Description Files Object number Sparse 1000 dimensional vectors that give the true object assignment objs.arff.gz RGB color histograms Standard RGB color histograms (uniform binning) aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz HSV color histograms Standard HSV/HSB color histograms in various binnings aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz Color similiarity Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black) aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other) Haralick features First 13 Haralick features (radius 1 pixel) aloi-haralick-1.csv.gz Front to back Vectors representing front face vs. back faces of individual objects front.arff.gz Basic light Vectors indicating basic light situations light.arff.gz Manual annotations Manually annotated object groups of semantically related objects such as cups manual1.arff.gz

Outlier Detection Versions

Additionally, we generated a number of subsets for outlier detection:

Feature type Description Files RGB Histograms Downsampled to 100000 objects (553 outliers) aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz Downsampled to 75000 objects (717 outliers) aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz Downsampled to 50000 objects (1508 outliers) aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz

Facebook

Twitter

Click to copy link

Link copied

Cite

Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff

Python and R Basics for Environmental Data Sciences

Explore at:

Dataset updated

Dec 5, 2021

Dataset provided by

Hydroshare

Authors

Tao Wen

Area covered

Description

This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.

Clear search

Close search

Google apps

Main menu

Python and R Basics for Environmental Data Sciences

UCF Google Street View Dataset 2014

Whitefish Lake Institute Long-Term Monitoring Dataset (2007-2021)

Large Datasets in R - Plant Phenology & Temperature Data from NEON

data_neo.Rdata

John r krollpfeiffer USA Import & Buyer Data

Clustering of samples and variables with mixed-type data

Replication Data for: \"A Topic-based Segmentation Model for Identifying...

Raw Navigation Data acquired during R/V Falkor (too) expedition FKt250220...

Estimate of total Alaskan salmon abundance by region, 2000-2015

R k builders hawaii USA Import & Buyer Data

RailEnV-PASMVS: a dataset for multi-view stereopsis training and...

The intrinsic matrix K is constructed using the following formulation:

The rotation vector as provided by Blender was first transformed to a rotation matrix:

Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system:

The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is:

Thus the representation from WORLD to CV/STANDARD coordinates is:

The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates:

Data from: Functional Additive Mixed Models

r ch

Howerton justin r USA Import & Buyer Data

Plocamium reproductive system data and R code

S r burzynski clinic USA Import & Buyer Data

Matthew r quillen USA Import & Buyer Data

CTD Data Acquired by R/V Xue Long in the Prydz Bay- Amery Ice Shelf Region,...

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

Python and R Basics for Environmental Data Sciences