Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Sea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in South Range, MI, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/south-range-mi-median-household-income-by-household-size.jpeg" alt="South Range, MI median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range median household income. You can refer the same here
These data are the standard error calculated from the AVISO Level 4 Absolute Dynamic Topography for Climate Model Comparison Number of Observations data set ( in PO.DAAC Drive at https://podaac-tools.jpl.nasa.gov/drive/files/allData/aviso/L4/abs_dynamic_topo ). This data set is not meant to be used alone, but with the absolute dynamic topography data. These data were generated to help support the CMIP5 (Coupled Model Intercomparison Project Phase 5) portion of PCMDI (Program for Climate Model Diagnosis and Intercomparison). The dynamic topograhy are from sea surface height measured by several satellites, Envisat, TOPEX/Poseidon, Jason-1 and OSTM/Jason-2 and referenced to the geoid. These data were provided by AVISO (French space agency data provider), which are based on a similar dynamic topography data set they already produce( http://www.aviso.oceanobs.com/index.php?id=1271 ).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aMixed between-within subjects analysis of variance – reported: interaction effect time x group (Wilk's Lambda).bCohens d calculated as the mean difference between groups divided by pooled standard deviation at baseline.*p
These four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Datasets from an interlaboratory comparison to characterize a multi-modal polydisperse sub-micrometer bead dispersion’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/7f7e5222-e579-486e-b5d7-c02d511d1964 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
These four data files contain datasets from an interlaboratory comparison that characterized a polydisperse five-population bead dispersion in water. A more detailed version of this description is available in the ReadMe file (PdP-ILC_datasets_ReadMe_v1.txt), which also includes definitions of abbreviations used in the data files. Paired samples were evaluated, so the datasets are organized as pairs associated with a randomly assigned laboratory number. The datasets are organized in the files by instrument type: PTA (particle tracking analysis), RMM (resonant mass measurement), ESZ (electrical sensing zone), and OTH (other techniques not covered in the three largest groups, including holographic particle characterization, laser diffraction, flow imaging, and flow cytometry). In the OTH group, the specific instrument type for each dataset is noted. Each instrument type (PTA, RMM, ESZ, OTH) has a dedicated file. Included in the data files for each dataset are: (1) the cumulative particle number concentration (PNC, (1/mL)); (2) the concentration distribution density (CDD, (1/mL·nm)) based upon five bins centered at each particle population peak diameter; (3) the CDD in higher resolution, varied-width bins. The lower-diameter bin edge (µm) is given for (2) and (3). Additionally, the PTA, RMM, and ESZ files each contain unweighted mean cumulative particle number concentrations and concentration distribution densities calculated from all datasets reporting values. The associated standard deviations and standard errors of the mean are also given. In the OTH file, the means and standard deviations were calculated using only data from one of the sub-groups (holographic particle characterization) that had n = 3 paired datasets. Where necessary, datasets not using the common bin resolutions are noted (PTA, OTH groups). The data contained here are presented and discussed in a manuscript to be submitted to the Journal of Pharmaceutical Sciences and presented as part of that scientific record.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in State Line City, IN, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/state-line-city-in-median-household-income-by-household-size.jpeg" alt="State Line City, IN median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State Line City median household income. You can refer the same here
propagated standard deviation of 12L-D measured via Incubation in mg C/m^3. Part of dataset Gradients 1-KOK1606 - Net Primary Productivity (via 14C method)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pathway Multi-Omics Simulated Data
These are synthetic variations of the TCGA COADREAD data set (original data available at http://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript "pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples".
There are 100 sets (stored as 100 sub-folders, the first 50 in "pt1" and the second 50 in "pt2") of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled "sim001", "sim002", ..., "sim100". Each folder contains the following contents: 1) "indicatorMatricesXXX_ls.RDS" is a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) "CNV_partitionA_deltaB.RDS" is the synthetically modified copy number variation data (where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) "RNAseq_partitionA_deltaB.RDS" is the synthetically modified gene expression data (same parameter legend as CNV), and (4) "Prot_partitionA_deltaB.RDS" is the synthetically modified protein expression data (same parameter legend as CNV).
Supplemental Files
The file "cluster_pathway_collection_20201117.gmt" is the collection of gene sets used for the simulation study in Gene Matrix Transpose format. Scripts to create and analyze these data sets available at: https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of the dataset with mean, standard deviation (SD), median, and the lower (quantile 5%) and upper (quantile 95%) boundary of the 90% confidence interval.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparative analysis of ECG feature values and their standard deviation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in United States, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/united-states-median-household-income-by-household-size.jpeg" alt="United States median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States median household income. You can refer the same here
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
You will find three datasets containing heights of the high school students.
All heights are in inches.
The data is simulated. The heights are generated from a normal distribution with different sets of mean and standard deviation for boys and girls.
Height Statistics (inches) | Boys | Girls |
---|---|---|
Mean | 67 | 62 |
Standard Deviation | 2.9 | 2.2 |
There are 500 measurements for each gender.
Here are the datasets:
hs_heights.csv: contains a single column with heights for all boys and girls. There's no way to tell which of the values are for boys and which ones are for girls.
hs_heights_pair.csv: has two columns. The first column has boy's heights. The second column contains girl's heights.
hs_heights_flag.csv: has two columns. The first column has the flag is_girl. The second column contains a girl's height if the flag is 1. Otherwise, it contains a boy's height.
To see how I generated this dataset, check this out: https://github.com/ysk125103/datascience101/tree/main/datasets/high_school_heights
Image by Gillian Callison from Pixabay
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Virgil population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Virgil across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Virgil was 25, a 0.00% decrease year-by-year from 2021. Previously, in 2021, Virgil population was 25, a decline of 0.00% compared to a population of 25 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Virgil decreased by 0. In this period, the peak population was 25 in the year 2000. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Virgil Population by Year. You can refer the same here
This data set contains the standard deviation of SeaWIFS k490 generated from the climatology monthly means; the monthly climatologies represent the mean values for each month across the whole dataset time series. K490 indicates the turbidity of the water column: how the visible light in the blue; green region of the spectrum penetrates within the water column. It is directly related to the presence of scattering particles in the water column. The data are received as monthly composites, with a 4 km resolution, and are constrained to the region between 90E and 180E, and 10N to 60S. The data was sourced from http://oceancolor.gsfc.nasa.gov/SeaWiFS/. This dataset is a contribution to the CERF Marine Biodiversity Hub.
standard deviation of 24L measured via Incubation in mg C/m^3. Part of dataset Gradients 1-KOK1606 - Net Primary Productivity (via 14C method)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monthly air temperature in 1153 stations and precipitation in 1202 stations in China and neighboring countries were collected to construct a monthly climate dataset in China on 0.025 ° resolution (approximately 2.5 km) named LZU0025 dataset designed by Lanzhou University (LZU), using a partial thin plate smoothing method embedded in the ANUSPLIN software. The accuracy of the LZU0025 was evaluated from analyzing three aspects: 1) Diagnostic statistics from surface fitting model in the period of 1951-2011, and results show low mean square root of generalized cross validation (RTGCV) for monthly air temperature surface (1.1 °C) and monthly precipitation surface (2 mm1/2) which interpolated the square root of itself. This indicate exact surface fitting models. 2) Error statistics based on 265 withheld stations data in the period of 1951-2011, and results show that predicted values closely tracked true values with mean absolute error (MAE) of 0.6 °C and 4 mm and standard deviation of mean error (STD) of 1.3 °C and 5 mm, and monthly STDs presented consistent change with RTGCV varying. 3) Comparisons to other datasets through two ways, one was to compare three indices namely the standard deviation, mean and time trend derived from all datasets to referenced dataset released by the China Meteorological Administration (CMA) in the Taylor diagrams, the other was to compare LZU0025 to the Camp Tibet dataset on mountainous remote area. Taylor diagrams displayed the standard deviation derived from LZU had higher correlation with that induced from CMA (Pearson correlation R=0.76 for air temperature case and R=0.96 for precipitation case). The standard deviation for this index derived from LZU was more close to that induced from CMA, and the centered normalized root-mean-square difference for this index derived from LZU and CMA was lower. The same superior performance of LZU were found in comparing indices of the mean and time trend derived from LZU and those induced from other datasets. LZU0025 had high correlation with the Camp dataset for air temperature despite of insignificant correlation for precipitation in few stations. Based on above comprehensive analyses, LZU0025 was concluded as the reliable dataset.
View on Map View ArcGIS Service BTM Standard deviation – this mosaic dataset is part of a series of seafloor terrain datasets aimed at providing a consistent baseline to assist users in consistently characterizing Aotearoa New Zealand seafloor habitats. This series has been developed using the tools provided within the Benthic Terrain Model (BTM [v3.0]) across different multibeam echo-sounder datasets. The series includes derived outputs from 50 MBES survey sets conducted between 1999 and 2020 from throughout the New Zealand marine environment (where available) covering an area of approximately 52,000 km2. Consistency and compatibility of the benthic terrain datasets have been achieved by utilising a common projected coordinate system (WGS84 Web Mercator), resolution (10 m), and by using a standard classification dictionary (also utilised by previous BTM studies in NZ). However, we advise caution when comparing the classification between different survey areas.Derived BTM outputs include the Bathymetric Position Index (BPI); Surface Derivative; Rugosity; Depth Statistics; Terrain Classification. A standardised digital surface model, and derived hillshade and aspect datasets have also been made available. The index of the original MBES survey surface models used in this analysis can be accessed from https://data.linz.govt.nz/layer/95574-nz-bathymetric-surface-model-index/The full report and description of available output datasets are available at: https://www.doc.govt.nz/globalassets/documents/science-and-technical/drds367entire.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset include the netcdf and the calibration reports of the hydrophones.
The systematic followed to create the netcdf is:
- Automatic detection of the tone time interval.
- This interval is divided into 100 elements.
- Each of these news intervals is evaluated with the test: detection threshold, homogeneity and frequency.
The structure of netcdf is detailed in the same file but is now explained in more detail:
- Cycle: It's the cycle number. The expected number is 50.
- Freq: It's the expected frequency.
- Freq_found: It's the frequency detected after the automatic tone process.
- Offset: Offset from the signal.
- SPL: It's the Signal, expressaed in 1/V, obtained from the 100 divisions of each tone detection.
- homogeneity: This parameter has values 0 and 1. The good signal is 1, and the values of 0 corresponding at the values greater than 2 standard deviations
- u_range: The contribution to the uncertainty is the range (Max-Min) of the 100 intervals
- u_sta: The contribution to the uncertainty is the statistical deviation of the 100 intervals
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.