40 datasets found

Supplementary material from "Visual comparison of two data sets: Do people...
figshare.com
xlsx
Updated Mar 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4751095.v1
Dataset updated
Mar 14, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Robin Kramer; Caitlin Telfer; Alice Towler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
d
Sea Surface Temperature (SST) Standard Deviation of Long-term Mean,...
catalog.data.gov
data.ioos.us
+1more
Updated Jan 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact) (2025). Sea Surface Temperature (SST) Standard Deviation of Long-term Mean, 2000-2013 - Hawaii [Dataset]. https://catalog.data.gov/dataset/sea-surface-temperature-sst-standard-deviation-of-long-term-mean-2000-2013-hawaii
Explore at:
Dataset updated
Jan 27, 2025
Dataset provided by
National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact)
Area covered
Hawaii
Description
Sea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.
AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison...
catalog.data.gov
data.nasa.gov
+1more
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AVISO;NASA/JPL/PODAAC (2025). AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison Standard Error [Dataset]. https://catalog.data.gov/dataset/aviso-level-4-absolute-dynamic-topography-for-climate-model-comparison-standard-error
Explore at:
Dataset updated
May 28, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
These data are the standard error calculated from the AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison Number of Observations data set ( in PO.DAAC Drive at https://podaac-tools.jpl.nasa.gov/drive/files/allData/aviso/L4/abs_dynamic_topo ). This data set is not meant to be used alone, but with the absolute dynamic topography data. These data were generated to help support the CMIP5 (Coupled Model Intercomparison Project Phase 5) portion of PCMDI (Program for Climate Model Diagnosis and Intercomparison). The dynamic topograhy are from sea surface height measured by several satellites, Envisat, TOPEX/Poseidon, Jason-1 and OSTM/Jason-2 and referenced to the geoid. These data were provided by AVISO (French space agency data provider), which are based on a similar dynamic topography data set they already produce( http://www.aviso.oceanobs.com/index.php?id=1271 ).
Datasets from an interlaboratory comparison to characterize a multi-modal...
catalog.data.gov
datasets.ai
+1more
Updated Jul 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Datasets from an interlaboratory comparison to characterize a multi-modal polydisperse sub-micrometer bead dispersion [Dataset]. https://catalog.data.gov/dataset/datasets-from-an-interlaboratory-comparison-to-characterize-a-multi-modal-polydisperse-sub
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
These four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
A
‘Datasets from an interlaboratory comparison to characterize a multi-modal...
analyst-2.ai
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Datasets from an interlaboratory comparison to characterize a multi-modal polydisperse sub-micrometer bead dispersion’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-datasets-from-an-interlaboratory-comparison-to-characterize-a-multi-modal-polydisperse-sub-micrometer-bead-dispersion-1603/c07ee221/?iid=004-407&v=presentation
Explore at:
Dataset updated
Jan 27, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Datasets from an interlaboratory comparison to characterize a multi-modal polydisperse sub-micrometer bead dispersion’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/7f7e5222-e579-486e-b5d7-c02d511d1964 on 27 January 2022.

--- Dataset description provided by original source is as follows ---

These four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.

--- Original source retains full ownership of the source dataset ---
Benchmark Multi-Omics Datasets for Methods Comparison
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Nov 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Odom; Gabriel Odom; Lily Wang; Lily Wang (2021). Benchmark Multi-Omics Datasets for Methods Comparison [Dataset]. http://doi.org/10.5281/zenodo.5683002
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5683002
Dataset updated
Nov 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gabriel Odom; Gabriel Odom; Lily Wang; Lily Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pathway Multi-Omics Simulated Data

These are synthetic variations of the TCGA COADREAD data set (original data available at http://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript "pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples".

There are 100 sets (stored as 100 sub-folders, the first 50 in "pt1" and the second 50 in "pt2") of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled "sim001", "sim002", ..., "sim100". Each folder contains the following contents: 1) "indicatorMatricesXXX_ls.RDS" is a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) "CNV_partitionA_deltaB.RDS" is the synthetically modified copy number variation data (where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) "RNAseq_partitionA_deltaB.RDS" is the synthetically modified gene expression data (same parameter legend as CNV), and (4) "Prot_partitionA_deltaB.RDS" is the synthetically modified protein expression data (same parameter legend as CNV).

Supplemental Files

The file "cluster_pathway_collection_20201117.gmt" is the collection of gene sets used for the simulation study in Gene Matrix Transpose format. Scripts to create and analyze these data sets available at: https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement
J
Standard Errors for Difference-in-Difference Regression (replication data)
journaldata.zbw.eu
zip
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruce Hansen; Bruce Hansen (2024). Standard Errors for Difference-in-Difference Regression (replication data) [Dataset]. http://doi.org/10.15456/jae.2024337.1643252147
Explore at:
zip(1269092)Available download formats
Unique identifier
https://doi.org/10.15456/jae.2024337.1643252147
Dataset updated
Dec 11, 2024
Dataset provided by
ZBW - Leibniz Informationszentrum Wirtschaft
Authors
Bruce Hansen; Bruce Hansen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All replication code for the above paper. This includes general-purpose R and Stata code, all simulation code, all empirical data sets, and R and Stata code to replicate the empirical results
r
Data from: SeaWIFS K490 Standard Deviation
researchdata.edu.au
devweb.dga.links.com.au
+1more
Updated Jul 31, 2008
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Ocean Data Network (2008). SeaWIFS K490 Standard Deviation [Dataset]. https://researchdata.edu.au/seawifs-k490-standard-deviation/687001
Explore at:
Dataset updated
Jul 31, 2008
Dataset provided by
Australian Ocean Data Network
Area covered

Description
This data set contains the standard deviation of SeaWIFS k490 generated from the climatology monthly means; the monthly climatologies represent the mean values for each month across the whole dataset time series. K490 indicates the turbidity of the water column: how the visible light in the blue; green region of the spectrum penetrates within the water column. It is directly related to the presence of scattering particles in the water column. The data are received as monthly composites, with a 4 km resolution, and are constrained to the region between 90E and 180E, and 10N to 60S. The data was sourced from http://oceancolor.gsfc.nasa.gov/SeaWiFS/. This dataset is a contribution to the CERF Marine Biodiversity Hub.
High School Heights Dataset
kaggle.com
Updated Aug 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yashmeet Singh (2022). High School Heights Dataset [Dataset]. https://www.kaggle.com/datasets/yashmeetsingh/high-school-heights-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yashmeet Singh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
High School Heights Dataset

You will find three datasets containing heights of the high school students.

All heights are in inches.

The data is simulated. The heights are generated from a normal distribution with different sets of mean and standard deviation for boys and girls.

Height Statistics (inches) Boys Girls
Mean 67 62
Standard Deviation 2.9 2.2

There are 500 measurements for each gender.

Here are the datasets:

hs_heights.csv: contains a single column with heights for all boys and girls. There's no way to tell which of the values are for boys and which ones are for girls.

hs_heights_pair.csv: has two columns. The first column has boy's heights. The second column contains girl's heights.

hs_heights_flag.csv: has two columns. The first column has the flag is_girl. The second column contains a girl's height if the flag is 1. Otherwise, it contains a boy's height.

To see how I generated this dataset, check this out: https://github.com/ysk125103/datascience101/tree/main/datasets/high_school_heights

Image by Gillian Callison from Pixabay
Z
SPI, KUH, and MCM derived lidar T datasets
data.niaid.nih.gov
Updated Jan 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A.A. Kutepov (2020). SPI, KUH, and MCM derived lidar T datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1306735
Explore at:
Dataset updated
Jan 21, 2020
Dataset provided by
J. Hoeffner
A. Feofilov
X. Chu
J.M. Russell
X. Lu
E.C.M. Dawkins
M. Mlynczak
L. Rezac
A.A. Kutepov
D. Janches
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spī Kōh
Description
These datasets are used within Dawkins et al., 'Validation of SABER v2.0 Operational Temperature Data with Ground-based Lidars in the Mesosphere-Lower Thermosphere Region (75-105 km).'The datasets correspond to the derived Spitsbergen, Kühlungsborn, and McMurdo temperature (T, units: K) datasets presented within the paper. Each text file contains header information necessary for interpreting the data, with each following a standard format.

For each season, the dataset comprises of 31 rows, and 5 columns. All rows correspond to altitudes: 75-105 km, at 1 km increments. Column 1 = lidar mean for season [Units=K] Column 2 = lidar standard deviation of mean for season [Units=K] Column 3 = combined SABER and lidar error for season [Units=K] Column 4 = standard deviation of individual (SABER-lidar) colocated pairs. [Units=K] Column 5 = T difference of seasonal means (SABER-lidar) [Units=K]

Also contains dataset presenting Figure 12 (n rows, 2 columns). Column 1: all SABER T data for all altitudes. Column 2: corresponding lidar T data. n rows corresponds to number of SABER-lidar T data pairs. All units: K.

DJF = December-January-February.

MAM = March-April-May.

JJA = June-July-August.

SON = September-October-November.
f
Performance (mean ± standard deviation) comparison among all competing...
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liye Wang; Chong-Yaw Wee; Heung-Il Suk; Xiaoying Tang; Dinggang Shen (2023). Performance (mean ± standard deviation) comparison among all competing methods. [Dataset]. http://doi.org/10.1371/journal.pone.0117295.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0117295.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Liye Wang; Chong-Yaw Wee; Heung-Il Suk; Xiaoying Tang; Dinggang Shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The prefix ’S’ denotes the use of a single-kernel SVR. (CC: Correlation Coefficient; RMSE: Root Mean Square Error)Performance (mean ± standard deviation) comparison among all competing methods.
g
Replication data for: A Bootstrap Method for Conducting Statistical...
datasearch.gesis.org
dataverse.harvard.edu
+1more
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harden, Jeffrey (2020). Replication data for: A Bootstrap Method for Conducting Statistical Inference with Clustered Data [Dataset]. https://datasearch.gesis.org/dataset/httpsdataverse.unc.eduoai--hdl1902.2911599
Explore at:
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
Harden, Jeffrey
Description
State politics researchers often analyze data with observations grouped into clusters. This structure commonly produces unmodeled correlation within clusters, leading to downward bias in the standard errors of regression coefficients. Estimating robust cluster standard errors (RCSE) is a common approach to correcting this bias. However, despite their frequent use, recent work indicates that RCSE can also be biased downward. Here I show evidence of that bias and offer a potential solution. Through Monte Carlo simulation of an Ordinary Least Squares (OLS) regression model, I compare conventional standard error (OLS-SE) and RCSE performance to that of a bootstrap method that resamples clusters of observations (BCSE). I show that both OLS-SE and RCSE are biased downward, with OLS-SE being the mo st biased. In contrast, BCSE are not biased and consistently outperform the other two methods. I conclude with three replications from recent work and offer recommendations to researchers.
a
NZ Seabed Geomorphology - BTM - Standard deviation
hub.arcgis.com
doc-marine-data-deptconservation.hub.arcgis.com
Updated Sep 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOC_admin (2022). NZ Seabed Geomorphology - BTM - Standard deviation [Dataset]. https://hub.arcgis.com/documents/18c8fb8623ba4ba0b5bed43f5dc5ffac
Explore at:
Dataset updated
Sep 1, 2022
Dataset authored and provided by
DOC_admin
Area covered
New Zealand
Description
View on Map View ArcGIS Service BTM Standard deviation – this mosaic dataset is part of a series of seafloor terrain datasets aimed at providing a consistent baseline to assist users in consistently characterizing Aotearoa New Zealand seafloor habitats. This series has been developed using the tools provided within the Benthic Terrain Model (BTM [v3.0]) across different multibeam echo-sounder datasets. The series includes derived outputs from 50 MBES survey sets conducted between 1999 and 2020 from throughout the New Zealand marine environment (where available) covering an area of approximately 52,000 km2. Consistency and compatibility of the benthic terrain datasets have been achieved by utilising a common projected coordinate system (WGS84 Web Mercator), resolution (10 m), and by using a standard classification dictionary (also utilised by previous BTM studies in NZ). However, we advise caution when comparing the classification between different survey areas.Derived BTM outputs include the Bathymetric Position Index (BPI); Surface Derivative; Rugosity; Depth Statistics; Terrain Classification. A standardised digital surface model, and derived hillshade and aspect datasets have also been made available. The index of the original MBES survey surface models used in this analysis can be accessed from https://data.linz.govt.nz/layer/95574-nz-bathymetric-surface-model-index/The full report and description of available output datasets are available at: https://www.doc.govt.nz/globalassets/documents/science-and-technical/drds367entire.pdf
A monthly air temperature and precipitation gridded dataset on 0.025°...
doi.pangaea.de
html, tsv
Updated Nov 5, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahu Chen; Hong Zhao; Wei Huang; Xian Wu; Yaowei Xie; Song Feng (2018). A monthly air temperature and precipitation gridded dataset on 0.025° spatial resolution in China during 1951-2011 [Dataset]. http://doi.org/10.1594/PANGAEA.895742
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.895742
Dataset updated
Nov 5, 2018
Dataset provided by
PANGAEA
Authors
Fahu Chen; Hong Zhao; Wei Huang; Xian Wu; Yaowei Xie; Song Feng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Variables measured
File name, File size, File format, File content, Uniform resource locator/link to file
Description
The monthly air temperature in 1153 stations and precipitation in 1202 stations in China and neighboring countries were collected to construct a monthly climate dataset in China on 0.025 ° resolution (approximately 2.5 km) named LZU0025 dataset designed by Lanzhou University (LZU), using a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of the LZU0025 was evaluated from analyzing three aspects: 1) Diagnostic statistics from surface fitting model in the period of 1951-2011, and results show low mean square root of generalized cross validation (RTGCV) for monthly air temperature surface (1.1 °C) and monthly precipitation surface (2 mm1/2) which interpolated the square root of itself. This indicate exact surface fitting models. 2) Error statistics based on 265 withheld stations data in the period of 1951-2011, and results show that predicted values closely tracked true values with mean absolute error (MAE) of 0.6 °C and 4 mm and standard deviation of mean error (STD) of 1.3 °C and 5 mm, and monthly STDs presented consistent change with RTGCV varying. 3) Comparisons to other datasets through two ways, one was to compare three indices namely the standard deviation, mean and time trend derived from all datasets to referenced dataset released by the China Meteorological Administration (CMA) in the Taylor diagrams, the other was to compare LZU0025 to the Camp Tibet dataset on mountainous remote area. Taylor diagrams displayed the standard deviation derived from LZU had higher correlation with that induced from CMA (Pearson correlation R=0.76 for air temperature case and R=0.96 for precipitation case). The standard deviation for this index derived from LZU was more close to that induced from CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. The same superior performance of LZU were found in comparing indices of the mean and time trend derived from LZU and those induced from other datasets. LZU0025 had high correlation with the Camp dataset for air temperature despite of insignificant correlation for precipitation in few stations. Based on above comprehensive analyses, LZU0025 was concluded as the reliable dataset.
s
Normalized Difference Water Index (NDWI) - Seasonal Mean - Switzerland
geonetwork.swissdatacube.org
doi
Updated Sep 17, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Université de Genève (2019). Normalized Difference Water Index (NDWI) - Seasonal Mean - Switzerland [Dataset]. https://geonetwork.swissdatacube.org/geonetwork/srv/api/records/af697b57-cf0a-4dc3-aa46-b394cf9f8c72
Explore at:
doiAvailable download formats
Dataset updated
Sep 17, 2019
Dataset authored and provided by
Université de Genève
Time period covered
Mar 28, 1984 - Jun 15, 2019
Area covered

Description
This dataset is a seasonal time-series of Landsat Analysis Ready Data (ARD)-derived Normalized Difference Water Index (NDWI) computed from Landsat 5 Thematic Mapper (TM) and Landsat 8 Opeational Land Imager (OLI). To ensure a consistent dataset, Landsat 7 has not been used because the Scan Line Correct (SLC) failure creates gaps into the data. NDWI quantifies plant water content by measuring the difference between Near-Infrared (NIR) and Short Wave Infrared (SWIR) (or Green) channels using this generic formula: (NIR - SWIR) / (NIR + SWIR) For Landsat sensors, this corresponds to the following bands: Landsat 5, NDVI = (Band 4 – Band 2) / (Band 4 + Band 2). Landsat 8, NDVI = (Band 5 – Band 3) / (Band 5 + Band 3). NDWI values ranges from -1 to +1. NDWI is a good proxy for plant water stress and therefore useful for drought monitoring and early warning. NDWI is sometimes alos refered as Normalized Difference Moisture Index (NDMI) Standard Deviation is provided in a separate dataset for each time step. Spring: March-April_May (_MAM) Summer: June-July-August (_JJA) Autumn: September-October-November (_SON) Winter: December-January-February (_DJF) Data format: GeoTiff This dataset has been genereated with the Swiss Data Cube (http://www.swissdatacube.ch)

Height Statistics (inches)	Boys	Girls
Mean	67	62
Standard Deviation	2.9	2.2

Dataset: A Labeled Dataset for Osteoporosis Screening Based on...

zenodo.org

csv

Updated May 13, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

D. A. Carvalho Dionísio; D. A. Carvalho Dionísio; Fernandes Felipe; Fernandes Felipe (2025). Dataset: A Labeled Dataset for Osteoporosis Screening Based on Electromagnetic Attenuation [Dataset]. http://doi.org/10.5281/zenodo.14259374

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14259374

Dataset updated

May 13, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

D. A. Carvalho Dionísio; D. A. Carvalho Dionísio; Fernandes Felipe; Fernandes Felipe

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

README

Dataset name: osseus_dataset.csv

Version: 1.0

Dataset period: 07/01/2021 - 09/31/2023

Dataset Characteristics: Multivalued

Number of Instances: 669

Number of Attributes: 31

Missing Values: yes

Area(s): Health and technology

Sources:

Electronic Patient Record (EPR) - University Hospital Onofre Lopes of Federal University of Rio Grande do Norte (HUOL/UFRN), Brazil;
OSSEUS (Osteoporosis screening based on electromagnetic waves); and,
DXA (Dual-energy x-ray absorptiometry).

Description: The dataset “osseus_dataset.csv” (Table 1) contains elementary data related to risk factors and examinations performed by individuals in Rio Grande do Norte, Brazil, to investigate bone mineral density. Data were collected using the EPR of HUOL/UFRN, DXA, and OSSEUS, a low-cost device based on electromagnetic waves, which measures the attenuation of the signal when crossing the medial phalanx of the middle finger (PINHEIRO et al., 2021, ALBUQUERQUE et al., 2022).

Descrição: O conjunto de dados “osseus_dataset.csv” (Tabela 1) contém dados elementares relacionados a fatores de risco e exames realizados por indivíduos no Estado do Rio Grande do Norte, Brasil, para a investigação da densidade mineral óssea. Os dados foram coletados por meio do EPR do HUOL/UFRN, DXA e OSSEUS, um dispositivo de baixo custo baseado em ondas eletromagnéticas, que mede a atenuação do sinal ao atravessar a falange medial do dedo médio (PINHEIRO et al., 2021, ALBUQUERQUE et al., 2022).

Table 1: Description of Dataset Features.

Attributes	Description	datatype	Value
Electronic Patient Record (EPR)
id	Unique identifier for a person (anonymous).	Categorical.	Person unique identifier.
gender	It informs the person's gender.	Categorical.	female male
age	It informs the person's age.	Numerical.	Integer value for age
weight	Informs the value referring to the person's weight—the unit of mass in kilogram (kg).	Numerical.	Integer value for weight
height	Informs the value relating to the person's height—the unit of measurement for size in centimeters (cm).	Numerical.	Integer value for height
ethnicity	Informs the person's ethnicity.	Categorical.	black brown white
target	Describe the person's diagnosis or medical report.	Categorical.	normal osteoporosis low bone mineral density
alcohol	It informs whether the person consumes alcoholic beverages.	Categorical.	yes no
smoking	Informs whether the person is a smoker.	Categorical.	yes no
activity	It informs whether the person practices physical activities.	Categorical.	yes no
milk	It informs whether the person consumes dairy drinks.	Categorical.	yes no
calcium	It informs whether the person uses calcium.	Categorical.	yes no
vitamin_d	It informs whether the person uses Vitamin D.	Categorical.	yes no
fall	It informs whether the person has a history of falling.	Categorical.	yes no
parents_osteoporosis	It informs whether the person has a family history of osteoporosis.	Categorical.	yes no
parents_curved	It informs whether the person has a family history of "parents curved."	Categorical.	yes no
corticosteroids	It informs whether the person uses corticosteroid-type medications for three months or longer.	Categorical.	yes no
arthritis	Informs if the person has arthritis.	Categorical.	yes no
diseases	Informs if the person has comorbidities.	Categorical.	yes no
menopause	Informs if the person has menopause.	Categorical.	yes no
testosterone	It informs whether the person uses testosterone.	Categorical.	yes no
OSSEUS (Osteoporosis screening based on electromagnetic waves)
medial_length	Length of the medial phalanx in mm.	Numerical.	Integer value for length.
medial_height	Height of the medial phalanx in mm.	Numerical.	Integer value for height.
medial_width	Width of the medial phalanx in mm.	Numerical.	Integer value for width.
calibration	Osseus signal strength with no obstacle between the antennas.	Numerical.	Float value for calibration.
attenuation	Osseus signal strength with obstacles between antennas.	Numerical.	Float value for attenuation.
DXA (Dual-energy x-ray absorptiometry)
spine_deviation	Reports the spinal standard deviation score that represents the difference between bone density and the expected value.	Numerical.	Float value for deviation.
femur_deviation	Reports the femur standard deviation score that represents the difference between bone density and the expected value.	Numerical.	Float value for deviation.
body_deviation	Reports the full body standard deviation score that represents the difference between bone density and the expected value.	Numerical.	Float value for deviation.
forearm_deviation	Reports the forearm standard deviation score that represents the difference between bone density and the expected value.	Numerical.	Float value for deviation.
worst_deviation	Reports the worst standard deviation score among all deviations obtained from the record.	Numerical.	Float value for deviation.

REFERENCES
Albuquerque, G. et al. A method based on non-ionizing microwave radiation for ancillary diagnosis of osteoporosis: a pilot study. BioMedical Eng. OnLine 21, 70, https://doi.org/10.1186/s12938-022-01038-y (2022).

Pinheiro, B. d. M. et al. The influence of antenna gain and beamwidth used in osseus in the screening process for osteoporosis. Sci. Reports 11, 19148, https://doi.org/10.1038/s41598-021-98204-4 (2021).

Supporting data for 'DFENS: Diffusion chronometry using Finite Elements and...
data-search.nerc.ac.uk
html
Updated Feb 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (2021). Supporting data for 'DFENS: Diffusion chronometry using Finite Elements and Nested Sampling' (NERC Grant NE/L002507/1) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/api/records/ba70f4a5-fb5a-303c-e054-002128a47908
Explore at:
htmlAvailable download formats
Dataset updated
Feb 5, 2021
Dataset authored and provided by
British Geological Surveyhttps://www.bgs.ac.uk/
License
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
Time period covered
Oct 1, 2014 - Jan 26, 2021
Description
This is supporting data for the manuscript entitled 'DFENS: Diffusion chronometry using Finite Elements and Nested Sampling' by E. J. F. Mutch, J. Maclennan, O. Shorttle, J. F. Rudge and D. Neave. Preprint here: https://doi.org/10.1002/essoar.10503709.1 Data Set S1. ds01.csv Electron probe microanalysis (EPMA) profile data of olivine crystals used in this study. Standard deviations are averaged values of standard deviations from counting statistics and repeat measurements of secondary standards. Data Set S2. ds02.csv Plagioclase compositional profiles used in this study, including SIMS, EPMA and step scan data. Standard deviations for EPMA analyses are averaged values of standard deviations from counting statistics and repeat measurements of secondary standards. Standard deviations for SIMS and step scan analyses are based on analytical precision of secondary standards. Data Set S3. ds03.csv Angles between the EPMA profile and the main olivine crystallographic axes measured by electron backscatter diffraction (EBSD). 'angle100X' is the angle between the [100] crystallographic axis and the x direction of the EBSD map, 'angle100Y' is the angle between [100] crystallographic axis and the y direction of the EBSD map, and 'angle100Z' is the angle between the [100] crystallographic axis and the z direction in the EBSD map etc. 'angle100P' is the angle between the EPMA profile and the [100] crystallographic axis, 'angle010P' is the angle between the EPMA profile and the [010] crystallographic axis, and 'angle100P' is the angle between the EPMA profile and the [001] crystallographic axis. All angles are in degrees. Data Set S4. ds04.csv Median timescales and 1 sigma errors from the olivine crystals of this study. The +1 sigma (days) is the quantile value calculated at 0.841 (i.e. 0.5 + (0.6826 / 2)). The -1 sigma (days) is therefore the quantile calculated at approximately 0.158 (which is 1 - 0.841). The 2 sigma is basically the same but it is 0.5 + (0.95/2). The value quoted as the +1 sigma (error) is the difference between the upper 1 sigma quantile and the median. Likewise the -1 sigma (error) is the difference between the median and the lower 1 sigma quantile. Data Set S5. ds05.xlsx Median timescales and 1 sigma errors from the plagioclase crystals of this study. Results from each of the parameterisations of the Mg-in-plagioclase diffusion data are included: Faak et al, (2013), Van Orman et al., (2014) and a combined expression. Data Set S6. ds06.xlsx Spreadsheet containing the regression parameters and covariance matrices used in this study and in Mutch et al. (2019). Additional versions of the olivine regressions where the ln fO2 is expressed in Pa have been made for completeness. We recommend using the versions where ln fO2 is expressed in its native form (bars).
n
ECMWF ERA5.1: 10 ensemble member surface level analysis parameter data for...
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Dec 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ECMWF ERA5.1: 10 ensemble member surface level analysis parameter data for 2000-2006 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=ensemble%20runs
Explore at:
Dataset updated
Dec 8, 2023
Description
This dataset contains ERA5.1 surface level analysis parameter data for the period 2000-2006 from 10 member ensemble runs. ERA5.1 is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project re-run for 2000-2006 to improve upon the cold bias in the lower stratosphere seen in ERA5 (see technical memorandum 859 in the linked documentation section for further details). Ensemble means and spreads are calculated from these 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble mean and ensemble spread data. The main ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data, ERA5t, are also available upto 5 days behind the present. A limited selection of data from these runs are also available via CEDA, whilst full access is available via the Copernicus Data Store.
d
Example Groundwater-Level Datasets and Benchmarking Results for the...
catalog.data.gov
data.usgs.gov
Updated Oct 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package [Dataset]. https://catalog.data.gov/dataset/example-groundwater-level-datasets-and-benchmarking-results-for-the-automated-regional-cor
Explore at:
Dataset updated
Oct 13, 2024
Dataset provided by
U.S. Geological Survey
Description
This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.
f
Means and standard deviations of BMI for each polymorphism, and main effects...
plos.figshare.com
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chunhui Chen; Wen Chen; Chuansheng Chen; Robert Moyzis; Qinghua He; Xuemei Lei; Jin Li; Yunxin Wang; Bin Liu; Daiming Xiu; Bi Zhu; Qi Dong (2023). Means and standard deviations of BMI for each polymorphism, and main effects and post hoc comparisons of SNPs that showed significant main effects and were used in subsequent multiple regression analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0058717.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0058717.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Chunhui Chen; Wen Chen; Chuansheng Chen; Robert Moyzis; Qinghua He; Xuemei Lei; Jin Li; Yunxin Wang; Bin Liu; Daiming Xiu; Bi Zhu; Qi Dong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: Empty cells mean no such genotypes were found in our sample. Maj: Major allele; Het: Heterozygote; Min: Minor allele.aResults (p values) of post hoc comparisons. mh = Maj versus Het, mm = Maj versus Min, hm = Het versus Min.bPost hoc comparison was not run because there were only 2 groups for this locus.

Facebook

Twitter

Click to copy link

Link copied

Cite

Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1

Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?"

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.4751095.v1

Dataset updated

Mar 14, 2017

Dataset provided by

Figsharehttp://figshare.com/

Authors

Robin Kramer; Caitlin Telfer; Alice Towler

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.

Clear search

Close search

Google apps

Main menu

Supplementary material from "Visual comparison of two data sets: Do people...

Sea Surface Temperature (SST) Standard Deviation of Long-term Mean,...

AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison...

Datasets from an interlaboratory comparison to characterize a multi-modal...

‘Datasets from an interlaboratory comparison to characterize a multi-modal...

Benchmark Multi-Omics Datasets for Methods Comparison

Standard Errors for Difference-in-Difference Regression (replication data)

Data from: SeaWIFS K490 Standard Deviation

High School Heights Dataset

High School Heights Dataset

SPI, KUH, and MCM derived lidar T datasets

Performance (mean ± standard deviation) comparison among all competing...

Replication data for: A Bootstrap Method for Conducting Statistical...

NZ Seabed Geomorphology - BTM - Standard deviation

A monthly air temperature and precipitation gridded dataset on 0.025°...

Normalized Difference Water Index (NDWI) - Seasonal Mean - Switzerland

Dataset: A Labeled Dataset for Osteoporosis Screening Based on...

Supporting data for 'DFENS: Diffusion chronometry using Finite Elements and...

ECMWF ERA5.1: 10 ensemble member surface level analysis parameter data for...

Example Groundwater-Level Datasets and Benchmarking Results for the...

Means and standard deviations of BMI for each polymorphism, and main effects...

Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?"