8 datasets found

R code
figshare.com
txt
Updated Jun 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5021297.v1
Dataset updated
Jun 5, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Christine Dodge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers
f
Data from: Error and anomaly detection for intra-participant time-series...
tandf.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David R. Mullineaux; Gareth Irwin (2023). Error and anomaly detection for intra-participant time-series data [Dataset]. http://doi.org/10.6084/m9.figshare.5189002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5189002
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
David R. Mullineaux; Gareth Irwin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
t
Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam...
service.tib.eu
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam bathymetry processed data (EM 120 echosounder dataset compilation) of RV METEOR & RV MARIA S. MERIAN during cruise M76/1 & MSM19/1c, Namibian continental slope. https://doi.org/10.1594/PANGAEA.932434 [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-932434
Explore at:
Dataset updated
Nov 29, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data contain bathymetric data from the Namibia continental slope. The data were acquired on R/V Meteor research expeditions M76/1 in 2008, and R/V Maria S. Merian expedition MSM19/1c in 2011. The purpose of the data was the exploration of the Namibian continental slope and espressially the investigation of large seafloor depressions. The bathymetric data were acquired with the 191-beam 12 kHz Kongsberg EM120 system. The data were processed using the public software package MBSystems. The loaded data were cleaned semi-automatically and manually, removing outliers and other erroneous data. Initial velocity fields were adjusted to remove artifacts from the data. Gridding was done in 10x10 m grid cells for the MSM19-1c dataset and 50x50 m for the M76 dataset using the Gaussian Weighted Mean algorithm.
d
DC Estimated Trees
opendata.dc.gov
s.cnmilf.com
+5more
Updated Oct 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2023). DC Estimated Trees [Dataset]. https://opendata.dc.gov/datasets/DCGIS::dc-estimated-trees
Explore at:
Dataset updated
Oct 17, 2023
Dataset authored and provided by
City of Washington, DC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Washington,
Description
Dataset estimates location and size of trees in the District of Columbia that are not managed by the Urban Forestry Division (https://opendata.dc.gov/datasets/urban-forestry-street-trees/explore). Trees are modeled using an automated feature extraction process applied to 2022 LiDAR data. All data is an estimate, and intended for general representation purposes.

DC 2022 LiDAR was used and processed using the “Extract Trees using Cluster Analysis” script which is included as part of Esri’s 3D Basemap solution. All LiDAR-derived trees within 2 meters of a Urban Forestry Division tree were removed as being duplicates.

Tree diameter (DBH, in inches) was estimated for the LiDAR-derived trees from calculated tree height (in feet) based on the equation: DBH = 0.4003*height - 1.9557. This equation was derived from a statistical analysis of a detailed park inventory tree data set and has an R^2 = 0.7418.

Extreme outliers were also modified, with any DBH larger than 80 inches being converted to a DBH of 80 inches.
Z
Identification of Performance Changes at Code Level (Measurement...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous for Reviewing (2022). Identification of Performance Changes at Code Level (Measurement Configuration Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6300863
Explore at:
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Anonymous for Reviewing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Measurement Configuration Dataset

This is the anonymous reviewing version; the source code repository will be added after the review.

This dataset provides reproduction data for performance measurement configuration at source code level in Java. The measurement data can be obtained using the precision-experiments repository https://anonymous.4open.science/r/precision-experiments-C613/ (Examining Different Repetition Counts) yourself. These data conatained here are the data we obtained from execution on i7-4770 CPU @ 3.40GHz.

The analysis was tested on Ubuntu 20.04 and gnuplot 5.2.8. It will not work with older gnuplot versions.

To execute the analysis, extract the data by

tar -xvf basic-parameter-comparison.tar tar -xvf parallel-sequential-comparison.tar

and afterwards build the precision-experiments repo and execute the analysis by

cd precision-experiments/precision-analysis/ ../gradlew fatJar cd scripts/configuration-analysis/ ./executeCompleteAnalysis.sh ../../../../basic-parameter-comparison ../../../../parallel-sequential-comparison

Afterwards, the following files will be present:

precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_all_en.pdf (Heatmaps for different repetition counts)

precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_outlierRemoval_en.pdf (Heatmap with and without outlier removal for 1000 repetitions)

precision-experiments/precision-analysis/scripts/configuration-analysis/histogram_outliers_en.pdf (Histogram of the outliers)

precision-experiments/precision-analysis/scripts/configuration-analysis/heatmap_parallel_en.pdf (Heatmap with sequential and parallel execution)
Soil and Landscape Grid National Soil Attribute Maps - Depth of Regolith (3"...
researchdata.edu.au
datadownload
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Grundy; Mark Thomas; Ross Searle; John Wilford; Searle, Ross (2024). Soil and Landscape Grid National Soil Attribute Maps - Depth of Regolith (3" resolution) - Release 2 [Dataset]. http://doi.org/10.4225/08/55C9472F05295
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.4225/08/55C9472F05295
Dataset updated
Aug 28, 2024
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Mike Grundy; Mark Thomas; Ross Searle; John Wilford; Searle, Ross
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1900 - Dec 31, 2013
Area covered

Description
This is Version 2 of the Depth of Regolith product of the Soil and Landscape Grid of Australia (produced 2015-06-01).

The Soil and Landscape Grid of Australia has produced a range of digital soil attribute products. The digital soil attribute maps are in raster format at a resolution of 3 arc sec (~90 x 90 m pixels).

Attribute Definition: The regolith is the in situ and transported material overlying unweathered bedrock; Units: metres; Spatial prediction method: data mining using piecewise linear regression; Period (temporal coverage; approximately): 1900-2013; Spatial resolution: 3 arc seconds (approx 90m); Total number of gridded maps for this attribute:3; Number of pixels with coverage per layer: 2007M (49200 * 40800); Total size before compression: about 8GB; Total size after compression: about 4GB; Data license : Creative Commons Attribution 4.0 (CC BY); Variance explained (cross-validation): R^2 = 0.38; Target data standard: GlobalSoilMap specifications; Format: GeoTIFF. Lineage: The methodology consisted of the following steps: (i) drillhole data preparation, (ii) compilation and selection of the environmental covariate raster layers and (iii) model implementation and evaluation.

Drillhole data preparation: Drillhole data was sourced from the National Groundwater Information System (NGIS) database. This spatial database holds nationally consistent information about bores that were drilled as part of the Bore Construction Licensing Framework (http://www.bom.gov.au/water/groundwater/ngis/). The database contains 357,834 bore locations with associated lithology, bore construction and hydrostratigraphy records. This information was loaded into a relational database to facilitate analysis.

Regolith depth extraction: The first step was to recognise and extract the boundary between the regolith and bedrock within each drillhole record. This was done using a key word look-up table of bedrock or lithology related words from the record descriptions. 1,910 unique descriptors were discovered. Using this list of new standardised terms analysis of the drillholes was conducted, and the depth value associated with the word in the description that was unequivocally pointing to reaching fresh bedrock material was extracted from each record using a tool developed in C# code.

The second step of regolith depth extraction involved removal of drillhole bedrock depth records deemed necessary because of the “noisiness” in depth records resulting from inconsistencies we found in drilling and description standards indentified in the legacy database.

On completion of the filtering and removal of outliers the drillhole database used in the model comprised of 128,033 depth sites.

Selection and preparation of environmental covariates The environmental correlations style of DSM applies environmental covariate datasets to predict target variables, here regolith depth. Strongly performing environmental covariates operate as proxies for the factors that control regolith formation including climate, relief, parent material organisms and time.

Depth modelling was implemented using the PC-based R-statistical software (R Core Team, 2014), and relied on the R-Cubist package (Kuhn et al. 2013). To generate modelling uncertainty estimates, the following procedures were followed: (i) the random withholding of a subset comprising 20% of the whole depth record dataset for external validation; (ii) Bootstrap sampling 100 times of the remaining dataset to produce repeated model training datasets, each time. The Cubist model was then run repeated times to produce a unique rule set for each of these training sets. Repeated model runs using different training sets, a procedure referred to as bagging or bootstrap aggregating, is a machine learning ensemble procedure designed to improve the stability and accuracy of the model. The Cubist rule sets generated were then evaluated and applied spatially calculating a mean predicted value (i.e. the final map). The 5% and 95% confidence intervals were estimated for each grid cell (pixel) in the prediction dataset by combining the variance from the bootstrapping process and the variance of the model residuals. Version 2 differs from version 1, in that the modelling of depths was performed on the log scale to better conform to assumptions of normality used in calculating the confidence intervals. The method to estimate the confidence intervals was improved to better represent the full range of variability in the modelling process. (Wilford et al, in press)
d
DC Tree Structure and Benefits
catalog.data.gov
datasets.ai
+2more
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2025). DC Tree Structure and Benefits [Dataset]. https://catalog.data.gov/dataset/dc-tree-structure-and-benefits
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
City of Washington, DC
Area covered
Washington
Description
DC 2022 LiDAR was used and processed using the “Extract Trees using Cluster Analysis” script which is included as part of Esri’s 3D Basemap solution. All LiDAR-derived trees within 2 meters of a Urban Forestry Division tree were removed as being duplicates.Tree diameter (DBH, in inches) was estimated for the LiDAR-derived trees from calculated tree height (in feet) based on the equation: DBH = 0.4003*height - 1.9557. This equation was derived from a statistical analysis of a detailed park inventory tree data set and has an R^2 = 0.7418.Extreme outliers were also modified, with any DBH larger than 80 inches being converted to a DBH of 80 inches.The combined data set was processed using the USDA Forest Service i-Tree eco software, where structure and environmental benefits were estimated.
f
DataSheet1_Use ggbreak to Effectively Utilize Plotting Space to Deal With...
frontiersin.figshare.com
pdf
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuangbin Xu; Meijun Chen; Tingze Feng; Li Zhan; Lang Zhou; Guangchuang Yu (2023). DataSheet1_Use ggbreak to Effectively Utilize Plotting Space to Deal With Large Datasets and Outliers.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.774846.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.774846.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Shuangbin Xu; Meijun Chen; Tingze Feng; Li Zhan; Lang Zhou; Guangchuang Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid increase of large-scale datasets, biomedical data visualization is facing challenges. The data may be large, have different orders of magnitude, contain extreme values, and the data distribution is not clear. Here we present an R package ggbreak that allows users to create broken axes using ggplot2 syntax. It can effectively use the plotting area to deal with large datasets (especially for long sequential data), data with different magnitudes, and contain outliers. The ggbreak package increases the available visual space for a better presentation of the data and detailed annotation, thus improves our ability to interpret the data. The ggbreak package is fully compatible with ggplot2 and it is easy to superpose additional layers and applies scale and theme to adjust the plot using the ggplot2 syntax. The ggbreak package is open-source software released under the Artistic-2.0 license, and it is freely available on CRAN (https://CRAN.R-project.org/package=ggbreak) and Github (https://github.com/YuLab-SMU/ggbreak).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christine Dodge (2017). R code [Dataset]. http://doi.org/10.6084/m9.figshare.5021297.v1

R code

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5021297.v1

Dataset updated

Jun 5, 2017

Dataset provided by

Figsharehttp://figshare.com/

Authors

Christine Dodge

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

R code used for each data set to perform negative binomial regression, calculate overdispersion statistic, generate summary statistics, remove outliers

Clear search

Close search

Google apps

Main menu

R code

Data from: Error and anomaly detection for intra-participant time-series...

Wenau, Stefan, Spieß, Volkhard, Zabel, Matthias (2021). Dataset: Multibeam...

DC Estimated Trees

Identification of Performance Changes at Code Level (Measurement...

Soil and Landscape Grid National Soil Attribute Maps - Depth of Regolith (3"...

DC Tree Structure and Benefits

DataSheet1_Use ggbreak to Effectively Utilize Plotting Space to Deal With...

R code