Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
https://www.icpsr.umich.edu/web/ICPSR/studies/8379/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8379/terms
This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.
Our service provides Jupyter Notebooks in both Python and R, showcasing essential data visualization techniques. Each notebook includes: Line plots, scatter plots, histograms, and box plots
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The temporary sample plot is a circular sample unit that covers an area of 400 m2. For each tree, we observe and measure the species, diameter, sunlight, and floor in relation to other trees, as well as the defoliation and the quality of the stems of deciduous trees and certain pines. In addition, the age and height of three trees are determined. Finally, a survey of the station and the vegetation of the undergrowth is carried out, and the characteristics of the soil are noted. Plot data also includes information about the location of the plots and the sampling plan. These data are acquired as part of the fourth ecoforest inventory of southern Québec. They are used in particular to produce forest compilation results used to feed the calculation of forest opportunities in public forests in Quebec. They can also be useful in the development of private forests. The establishment of these plots took place from 2004 to 2018. This database covers almost all of the territory south of the 52nd parallel of Quebec's public and private forest.**This third party metadata element was translated using an automated translation tool (Amazon Translate).**
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The link: Access the data directory is available in the section*Dataset Description Sheets; Additional Information*. The permanent sample plot is a circular sampling unit that covers an area of 400 m2. For each tree, the species, diameter, defoliation of softwoods and the quality of hardwoods are observed and measured. Some of these stems are the subject of further studies in order to know their height and age. Finally, other surveys make it possible to identify the ecological characteristics of the station where the plot is located, whether at ground level or undergrowth plants. Since 1970, **more than 12,000 permanent sample plots have been established ** and are monitored decennial. This data is an invaluable source of information on the growth and evolution of forests in Quebec, a source that is enriched each time additional measures are added. In particular, they are used to establish forest growth rates, describe past changes, and model forest evolution. This database covers almost all of the territory south of the 52nd parallel of Quebec's public and private forest. _ ⚠️ Note:_ Notice to people who wish to use the permanent sample plots for training needs or for samples of any kind. Please contact the Forest Inventory Directorate at 📩 Inventaires.Forestiers@mrnf.gouv.qc.ca**This third party metadata element was translated using an automated translation tool (Amazon Translate).**
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: none of the data sets published here contain actual data, they are for testing purposes only.
This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:
dataset_30_nodes_interactions.csv
:contains 30 rows (nodes).dataset_30_edges_interactions.csv
: contains 47 rows (edges).dataset_30
refers to the same graph.Each dataset contains the following columns:
Name of the Column | Type | Description |
UniProt ID | string | protein identification |
label | string | protein label (type of node) |
properties | string | a dictionary containing properties related to the protein. |
Each dataset contains the following columns:
Name of the Column | Type | Description |
Relationship ID | string | relationship identification |
Source ID | string | identification of the source protein in the relationship |
Target ID | string | identification of the target protein in the relationship |
label | string | relationship label (type of relationship) |
properties | string | a dictionary containing properties related to the relationship. |
Graph | Number of Nodes | Number of Edges | Sparse graph |
dataset_30* |
30 | 47 |
Y |
dataset_60* |
60 |
181 |
Y |
dataset_120* |
120 |
689 |
Y |
dataset_240* |
240 |
2819 |
Y |
dataset_300* |
300 |
4658 |
Y |
dataset_600* |
600 |
18004 |
Y |
dataset_1200* |
1200 |
71785 |
Y |
dataset_2400* |
2400 |
288600 |
Y |
dataset_3000* |
3000 |
449727 |
Y |
dataset_6000* |
6000 |
1799413 |
Y |
dataset_12000* |
12000 |
7199863 |
Y |
dataset_24000* |
24000 |
28792361 |
Y |
dataset_30000* |
30000 |
44991744 |
Y |
This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.
Each dataset contains the following columns:
Name of the Column | Type | Description |
ID | string | node identification |
label | string | node label (type of node) |
properties | string | a dictionary containing properties related to the node. |
Each dataset contains the following columns:
Name of the Column | Type | Description |
ID | string | relationship identification |
source | string | identification of the source node in the relationship |
target | string | identification of the target node in the relationship |
label | string | relationship label (type of relationship) |
properties | string | a dictionary containing properties related to the relationship. |
Graph | Number of Nodes | Number of Edges | Sparse graph |
dataset_dummy* | 3 | 6 | N |
dataset_dummy2* | 3 | 6 | N |
GIFplots Files containing GIF images of spectral plots: - GIFplots_splib07a.zip contains plots of measured spectra, including * plots showing the full wavelength range of the measured spectra, organized in chapter sub-folders as described previously for the ASCII data. * plots showing specific portions of the electromagnetic spectrum are organized folders within the “plots_by_wavelength_region” folder, including: - range1_uv_to_visible (0.2 - 1.0 microns) - range2_visible_to_swir (0.2 - 2.5 microns) - range3_swir (1.5 - 5.5 microns) - range4_swir_to_mir (2.5 - 25 microns) - range5_swir_to_fir_wavenumber (4,000 - 50 cm-1 which spans 2.5 - 200 microns) - plots of spectra interpolated to a higher number of more finely-spaced channels showing the full wavelength range , organized in chapter sub-folders (GIFplots_splib07b.zip) - plots of spectra convolved to other spectrometers showing the full wavelength range of the spectrometer, organized in chapter sub-folders, for example * Analytical Spectral Devices (GIFplots_splib07b_cvASD.zip) * AVIRIS-Classic 2014 characteristics (GIFplots_splib07b_cvAVIRISc2014.zip) * Hyperspectral Mapper 2014 characteristics (GIFplots_splib07b_cvHYMAP2014.zip) * and others - plots of spectra resampled to multispectral sensors showing the full wavelength range of the sensor, organized in chapter sub-folders, for example: * Advanced Spaceborne Thermal Emission and Reflection Radiometer (GIFplots_splib07b_rsASTER.zip) * and others GENERAL LIBRARY DESCRIPTION This data release provides the U.S. Geological Survey (USGS) Spectral Library Version 7 and all related documents. The library contains spectra measured with laboratory, field, and airborne spectrometers. The instruments used cover wavelengths from the ultraviolet to the far infrared (0.2 to 200 microns). Laboratory samples of specific minerals, plants, chemical compounds, and man-made materials were measured. In many cases, samples were purified, so that unique spectral features of a material can be related to its chemical structure. These spectro-chemical links are important for interpreting remotely sensed data collected in the field or from an aircraft or spacecraft. This library also contains physically-constructed as well as mathematically-computed mixtures. Measurements of rocks, soils, and natural mixtures of minerals have also been made with laboratory and field spectrometers. Spectra of plant components and vegetation plots, comprising many plant types and species with varying backgrounds, are also in this library. Measurements by airborne spectrometers are included for forested vegetation plots, in which the trees are too tall for measurement by a field spectrometer. The related U.S. Geological Survey Data Series publication, "USGS Spectral Library Version 7", describes the instruments used, metadata descriptions of spectra and samples, and possible artifacts in the spectral measurements (Kokaly and others, 2017). Four different spectrometer types were used to measure spectra in the library: (1) Beckman™ 5270 covering the spectral range 0.2 to 3 µm, (2) standard, high resolution (hi-res), and high-resolution Next Generation (hi-resNG) models of ASD field portable spectrometers covering the range from 0.35 to 2.5 µm, (3) Nicolet™ Fourier Transform Infra-Red (FTIR) interferometer spectrometers covering the range from about 1.12 to 216 µm, and (4) the NASA Airborne Visible/Infra-Red Imaging Spectrometer AVIRIS, covering the range 0.37 to 2.5 µm. Two fundamental spectrometer characteristics significant for interpreting and utilizing spectral measurements are sampling position (the wavelength position of each spectrometer channel) and bandpass (a parameter describing the wavelength interval over which each channel in a spectrometer is sensitive). Bandpass is typically reported as the Full Width at Half Maximum (FWHM) response at each channel (in wavelength units, for example nm or micron). The linked publication (Kokaly and others, 2017), includes a comparison plot of the various spectrometers used to measure the data in this release. Data for the sampling positions and the bandpass values (for each channel in the spectrometers) are included in this data release. These data are in the SPECPR files, as separate data records, and in the American Standard Code for Information Interchange (ASCII) text files, as separate files for wavelength and bandpass. Spectra are provided in files of ASCII text format (files with a .txt file extension). In the ASCII files, deleted channels (bad bands) are indicated by a value of -1.23e34. Metadata descriptions of samples, field areas, spectral measurements, and results from supporting material analyses – such as XRD – are provided in HyperText Markup Language HTML formatted ASCII text files (files with .html file extension). In addition, Graphics Interchange Format (GIF) images of plots of spectra are provided. For each spectrum a plot with wavelength in microns on the x-axis is provided. For spectra measured on the Nicolet spectrometer, an additional GIF image with wavenumber on the x-axis is provided. Data are also provided in SPECtrum Processing Routines (SPECPR) format (Clark, 1993) which packages spectra and associated metadata descriptions into a single file (see the linked publication, Kokaly and others, 2017, for additional details on the SPECPR format and freely-available software than can be used to read files in SPECPR format). The data measured on the source spectrometers are denoted by the “splib07a” tag in filenames. In addition to providing the original measurements, the spectra have been convolved and resampled to different spectrometer and multispectral sensor characteristics. The following list specifies the identifying tag for the measured and convolved libraries and gives brief descriptions of the sensors. splib07a – this is the name of the SPECPR file containing the spectra measured on the Beckman, ASD, Nicolet and AVIRIS spectrometers. The data are provided with their original sampling positions (wavelengths) and bandpass values. The prefix “splib07a_” is at the beginning of the ASCII and GIF files pertaining to the measured spectra. splib07b – this is the name of the SPECPR file containing a modified version of the original measurements. The results from using spectral convolution to convert measurements to other spectrometer characteristics can be improved by oversampling (increasing sample density). Thus, splib07b is an oversampled version of the library, computed using simple cubic-spline interpolation to produce spectra with fine sampling interval (therefore a higher number of channels) for Beckman and AVIRIS measurements. The spectra in this version of the library are the data used to create the convolved and resampled versions of the library. The prefix “splib07b_” is at the beginning of the ASCII and GIF files pertaining to the oversampled spectra. s07_ASD – this is the name of the SPECPR file containing the spectral library measurements convolved to standard resolution ASD full range spectrometer characteristics. The standard reported wavelengths of the ASD spectrometers used by the USGS were used (2151 channels with wavelength positions starting at 350 nm and increasing in 1 nm increments). The bandpass values of each channel were determined by comparing measurements of reference materials made on ASD spectrometers in comparison to measurements made of the same materials on higher resolution spectrometers (the procedure is described in Kokaly, 2011, and discussed in Kokaly and Skidmore, 2015, and Kokaly and others, 2017). The prefix “s07ASD_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV95 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1995 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV95_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV96 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1996 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV96_” is at the beginning of the ASCII, and GIF files. s07_AV97 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1997 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV97_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV98 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1998 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV98_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV99 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 1999 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV99_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer. s07_AV00 – this is the name of the SPECPR file containing the spectral library measurements convolved to AVIRIS-Classic with spectral characteristics determined in the year 2000 (wavelength and bandpass values for the 224 channels provided with AVIRIS data by NASA/JPL). The prefix “s07_AV00_” is at the beginning of the ASCII and GIF files pertaining to this spectrometer.
The Hillslope Study sites represent a gradient of landscapes, including forested, valley agriculture, and mountain housing developments. These locations and plots were used to collect samples of various matrices for numerous analyses at differing intervals. The data set consists of Open Office spreadsheet and other files that document all the Hillslope Study locations.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set include a list of sample trees ≥10 cm DBH from the Swedish National Forest Inventory. Individual sample trees measured and cored within a ten metre wide transect during the 1923-1929 survey or circular sample plots during the 1953-1962, 1983-1992, and 2013-2022 surveys are included. The 10 cm threshold was used to exclude smaller diameter trees measured on small parts of sections or plots. Variables include cluster and plot ID, tree species, diameter, tree age at breast height and total tree age, land-cover class and expansion factors for estimation of number of trees and volumes. The current NFI is based on an annual sample of about 20,000 circular plots, grouped into clusters, of which about 12,000 are surveyed in the field each year. Data for additional sample trees and more variables for individual sample trees can be obtained from the Swedish National Forest Inventory. More details are presented in the article, see
Jacobsson, Jonas, Fridman, Jonas, Axelsson, Anna-Lena, Milberg, Per (2025). An aging population? A century of change among Swedish forest trees. Forest Ecology and Management. 580:122509. https://doi.org/10.1016/j.foreco.2025.122509
The data file contains 17 columns and 384790 rows.
Permanent forest mensuration sample plots that are re-measured every 5 or 10 years as part of a rolling annual programme. Data is collected from a sample of public and private woodland sites across Great Britain.
The Division of Forestry completed a forest inventory on Native corporation owned lands in 2018. The project area encompasses forest lands in the Lower Kuskokwim River near the communities of Lower Kalskag, Upper Kalskag and Aniak.
The impact evaluation study of the MCA-M PRP will be the first fully randomized evaluation of a large-scale land titling program. Randomization will occur at the geographic level akin to a neighborhood. Mongolian cities are divided up into a number of administrative units - the smallest being the “kheseg”. Khesegs were chosen as the unit of randomization for the study because they are a well-defined unit that is small and numerous enough to allow for sufficient statistical power. The baseline estimation strategy will be a differences-in-differences approach, where we compare the outcomes of households in the treatment group with the control group as well as before and after the completion of the formalization activities. Exposure to treatment was 66% in Darkhan and Erdenet, and 50% in Ulaanbaatar districts. There are no results to report as of now because only the baseline has been conducted so far.
Regionally: Ulaanbaatar, Darkhan and Erdenet
Kheseg (Neighborhood)
Households living in hashaa plots in the ger districts of Mongolia's three largest cities: Ulaanbaatar, Darkhan, and Erdenet.
Sample survey data [ssd]
8,552 plots were identified for surveying for the sample. Of these, 6,344 were occupied households and 5,816 were successfully interviewed for a response rate of 68%. 528 households refused to participate in the survey and 2,068 plots were unoccupied, had no one present at the time of any of the survey attempts, or were invalid plots. Plots found to be unoccupied or to be owned or occupied by a business or state entities were deemed unsuitable for the survey and were dropped from the sample. Geographic Information System data on all hashaa plots in the ger areas of the relevant districts of the capital and in Darkhan and Erdenet, were obtained from the PRP PIU. The ownership status of many of these plots was recorded in this GIS data set, though the ownership status information was known to be out of date and inaccurate. The boundaries of administrative units such as city, district, khoroo, and kheseg were also included. IPA processed the GIS data using ArcGIS and Stata computer software.
Once the GIS and administrative cadastral data sets were integrated, sample selection was stratified by kheseg, a geographical unit roughly equivalent to a neighborhood in the United States. First, the number of program-eligible plots per kheseg was calculated. Plots listed as “fully registered” in the GIS data were not included in this calculation since they would not be eligible for project assistance. Weights were then calculated for each kheseg unit that measured the proportion of the total number of eligible plots located in this unit. These weights were then multiplied by 8,000, the total number of plots it was deemed desirable and feasible to include in survey activities, to determine the number of plots to be sampled from each kheseg. After the sample size for each kheseg was determined, plots were randomly selected for inclusion in the survey.
In November of 2010, the survey contractor selected by MCA-M began administering the questionnaire to the households residing on and/or owning the plots selected during the sampling process. Due to the anticipated errors in the Geographic Information System data, not all of the hashaa plots selected for the SHPS sample were occupied. In addition, Mongolian households are extremely mobile. To minimize these challenges, the survey teams were required to make four attempts to locate the hashaa plot to determine the registration status and an additional four attempts to complete the survey questionnaire. Unfortunately, the SHPS had to be suspended after several weeks of data collection due to unforeseen delays in project implementation. The scope of the project was subsequently adjusted and the project implementation areas shifted due to the inflexibility of the data collection contract. The scope of the project was reduced from covering all districts in Ulaanbaatar to covering only the three largest districts, Bayanzurkh, Chingeltei, and Songinokhairkhan.
Household questionnaire prepared in both Mongolian and English. The team organized 4 pilot testings involving 109 respondents. Modules: - Log of attempts made to take survey, - 1. Registration section 2. Control section (filled by enumerator) 3. Introduction to survey 4. Basic Information 5. Demographic, education level and residential information of household members 6. Economic activities and incomes of household members 7. Household assets and properties 8. Planned future investments 9. Registration status of plot being surveyed 10. Implementation level of the 2003 amendment to the Land Law. 11. Accessability of land registration information and service quality at General Authority of State Registration 12. Land conflicts 13. Hashaa plot sales and its market value 14. Infrastructure of hashaa plots 15. Household spendings 16. Household business activities 17. Insurance 18. Household loans 19. Government policy and thoughts on its implementation 20. Citizens' involvement and labor in common 21. Risk evaluation
IDs in the dataset were checked against the original sample frame to make sure that they were correctly entered and complete. In addition,team leaders manually inspected each survey to ensure accuarcy of data collected and for logicallly consistancy. Back checks were also performed.
The response rate was 68%.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Jorat is one of the largest continously forested areas in the Swiss Plateau. On a forested area of 778 ha, the Parc Naturel du Jorat (PNJ), a periurban parc, has been established in 2022. To document the initial state of the forest within the perimeter of the PNJ, a sample plot inventory (SPI) was carried out on 132 sample plots (SP) in winter 2021/22. This dataset contains results from this sample plot inventory. It consists of the following files: - results_trees.csv: Results for or living and dead trees. - results_regeneration.csv: Results for trees with DBH - 7.0 cm, assessed in three height classes. - results_lying_deadwood.csv: Results for lying deadwood, assessed no three line transects - results_trems.csv: Results for occurence of tree related microhabitats (TreMs) - results_habitat_trees.csv: Results for occurence / densities of trees carrying at least one TreM or with a DBH -= 80 for living trees or -= 36 cm for dead trees respectively. - lookup.csv: Contains lookup tables which describe the respective results in-depth. - data_description.pdf: Briefly describes the datasets mentioned above.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
LLM Distribution Evaluation Dataset
This dataset contains 1000 synthetic graphs with questions and answers about statistical distributions, designed to evaluate large language models' ability to analyze data visualizations.
Dataset Description
Dataset Summary
This dataset contains diverse statistical visualizations (bar charts, line plots, scatter plots, histograms, area charts, and step plots) with associated questions about:
Normality testing Distribution… See the full description on the dataset page: https://huggingface.co/datasets/robvanvolt/llm-distribution-sample.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data are a harmonized collation of vegetation survey plot datasets from across Australia, representing 205,084 plots. Source data were obtained from the main custodians of vegetation plot information in Australia, being primarily state/territory and federal government agencies. The collated source data were harmonised to a common structured data format through customised scripts written in R. The data provided here are those for which the source data licenses enable adaptation and sharing, though data were also collated and harmonised from a number of sources that could not be included in this data release due to license restrictions. Important methodological attributes have been incorporated where possible, including the taxonomic scope of each survey and the size/shape of the survey plots used. This harmonised dataset may enable a wide variety of analyses that improve our understanding of Australian plant diversity and vegetation patterns. Lineage: Given the absence of an existing large-scale harmonised plant community survey plot dataset for Australia, we obtained and collated data primarily from state/territory and federal agencies, that are custodians of the largest survey datasets. In some cases we downloaded data directly from publicly accessible websites, while for others we required assistance and permission to obtain relevant data.
Given the different formats of the source data, we developed a simple, customised and structured data format to harmonise across sources, based broadly on the Veg-X schema (Wiser, et al. 2011), with existing standards for data fields used wherever possible (Veg-X, Darwin Core). Source data were harmonised to the common format using a customised script in R. Taxonomic nomenclature was standardised to the Australian Plant Census (CHAH 2022), using code adapted from Falster et al. (2021), with only vascular plant species retained.
Key methodological aspects of the component datasets were also incorporated, including the taxonomic scope of the vegetation survey (e.g. all vascular plants , dominant species only) and the size / configuration of the plot that was surveyed. Obtaining such information often involved identifying publications related to the data and cataloguing the methods described.
Data products The data were formatted and prepared into the following files, linked by common identifiers: •\tproject.csv – this file describes attributes of the projects undertaken to sample survey plots. Most projects are associated with surveying multiple plots over space and time, using a common methodology. •\tplot.csv – this file describes attributes of the plots that have been surveyed. A plot is a fixed area/location in space, that may be surveyed at one or more times, for one or more attributes (e.g. plant species, soil attributes). •\tplotObservation.csv – this file describes the attributes associated with the observations taken at a plot at a particular time (date). There may be multiple plot observations for a single plot. •\taggregateOrganismObservation.csv – this file describes the attributes of plant species observed in a specific plot observation, such as the species observed and any measure of abundance that was made. •\taggregateSoilObservation.csv – this file describes the attributes of the soil that were made in a specific plot observation. •\tspeciesAttributes.csv – this file describes the attributes associated with species names included in the dataset, where the scientific name for each plant species is that accepted by the Australian Plant Census (CHAH 2022).
The contents (data fields) of each file listed above are described in the file: HAVPlot_Data_Format.csv
The sources of the plot data provided here are shown in the file: HAVPlot_source_citations.docx . Use of the data provided here should comply with the data license conditions of the source data.
A coded example (using R) for combining and manipulating the component HAVPlot data files is provided in the file: HAVPlot_data_query_example_R_code.R
Summary The HAVPlot data comprise 213,101 observations across 205,084 plots. A summary of the full HAVPlot data are also available in Mokany et al. (2022).
References CHAH (2022). Australian Plant Census, Centre of Australian National Biodiversity Research. Council of Heads of Australasian Herbaria (CHAH). https://id.biodiversity.org.au/tree/51354547. Falster, D., et al. 2021. AusTraits, a curated plant trait database for the Australian flora. - Scientific Data 8: 254. Mokany, K., et al. 2022. Patterns and drivers of plant diversity across Australia. – Ecography e06426. https://doi.org/10.1111/ecog.06426 Wiser, S. K., et al. 2011. Veg-X - an exchange standard for plot-based vegetation data. - Journal of Vegetation Science 22: 598-609.
The Division of Forestry completed a forest inventory on Alaska state owned lands in 2016. The project area encompasses forest lands in the Upper Kuskokwim River near the communities of McGrath and Nikolai. The purpose of this GIS layer, is to create a spatial coverage of vegetation on state lands to aid in forest management.
Vegetation cover types used to develop a forest inventory conducted by the State of Alaska Division of Forestry. Inventory with supporting ground plots on State, Federal and Native Corporation land in the Cordova Area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Long-term monitoring of a forest tree community is a basis for understanding forest structure and dynamics, and for evaluating ecosystem functions such as primary production. Because global climate change has impacted and will change the forest ecosystems from local to global scales, it is essential to document long-term monitoring data of the forests to examine the temporal and geographical trends of forest changes. We here report the monitoring data of 45 forest plots (average area: 0.69 ha) from 27 sites in Japan. The plots are located with latitude ranges from N 32.38 to N 43.36, and with elevation ranges from 8 m to 2453 m above sea level. These plots include both old-growth and secondary forests, and cover various forest biomes, such as warm-temperate evergreen forests, temperate deciduous broadleaved forests, and boreal or sub-alpine coniferous forests. In each plot, all living trees and liana larger than a certain minimum size (basically 15 cm stem girth at breast height) were repeatedly measured, and survival and recruitment of stems were recorded. Monitoring period varies among plots from 5 to 40 years with the average of 17.3 years. The tree measurement data are presented as a format common to that of the preceding Monitoring Sites 1000 Project in Japan, and as a sample-based Darwin Core format. This dataset expands existing open monitoring data of Japanese forests and thereby facilitates further meta-analysis on forest community structures and their changes in relation to climate change and other drivers. This dataset is published as a data paper in Ecological Research (see https://doi.org/10.1111/1440-1703.12457).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.