79 datasets found
  1. d

    Sea Surface Temperature (SST) Standard Deviation of Long-term Mean,...

    • catalog.data.gov
    • data.ioos.us
    • +2more
    Updated Jan 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact) (2025). Sea Surface Temperature (SST) Standard Deviation of Long-term Mean, 2000-2013 - Hawaii [Dataset]. https://catalog.data.gov/dataset/sea-surface-temperature-sst-standard-deviation-of-long-term-mean-2000-2013-hawaii
    Explore at:
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact)
    Area covered
    Hawaii
    Description

    Sea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.

  2. H

    Data from: How Do We Know What We Know? Learning from Monte Carlo...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Kagalwala; Vincent Hopkins; Mark Pickup (2023). How Do We Know What We Know? Learning from Monte Carlo Simulations [Dataset]. http://doi.org/10.7910/DVN/UNEBPY
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Ali Kagalwala; Vincent Hopkins; Mark Pickup
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Monte Carlo simulations are commonly used to test the performance of estimators and models from rival methods under a range of data generating processes. This tool improves our understanding of the relative merits of rival methods in different contexts, such as varying sample sizes and violations of assumptions. When used, it is common to report the bias and/or the root mean squared error of the different meth- ods. It is far less common to report the standard deviation, overconfidence, coverage probability, or power. Each of these six performance statistics provides important, and often differing, information regarding a method’s performance. Here, we present a structured way to think about Monte Carlo performance statistics. In replications of three prominent papers, we demonstrate the utility of our approach and provide new substantive results about the performance of rival methods.

  3. High School Heights Dataset

    • kaggle.com
    zip
    Updated Aug 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yashmeet Singh (2022). High School Heights Dataset [Dataset]. https://www.kaggle.com/datasets/yashmeetsingh/high-school-heights-dataset
    Explore at:
    zip(7297 bytes)Available download formats
    Dataset updated
    Aug 11, 2022
    Authors
    Yashmeet Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    High School Heights Dataset

    You will find three datasets containing heights of the high school students.

    All heights are in inches.

    The data is simulated. The heights are generated from a normal distribution with different sets of mean and standard deviation for boys and girls.

    Height Statistics (inches)BoysGirls
    Mean6762
    Standard Deviation2.92.2

    There are 500 measurements for each gender.

    Here are the datasets:

    • hs_heights.csv: contains a single column with heights for all boys and girls. There's no way to tell which of the values are for boys and which ones are for girls.

    • hs_heights_pair.csv: has two columns. The first column has boy's heights. The second column contains girl's heights.

    • hs_heights_flag.csv: has two columns. The first column has the flag is_girl. The second column contains a girl's height if the flag is 1. Otherwise, it contains a boy's height.

    To see how I generated this dataset, check this out: https://github.com/ysk125103/datascience101/tree/main/datasets/high_school_heights

    Image by Gillian Callison from Pixabay

  4. d

    Standard deviation of the bathymetric DEM of the Sacramento River, from the...

    • catalog.data.gov
    • data.cnra.ca.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Standard deviation of the bathymetric DEM of the Sacramento River, from the Feather River to Knights Landing, California in February 2011 [Dataset]. https://catalog.data.gov/dataset/standard-deviation-of-the-bathymetric-dem-of-the-sacramento-river-from-the-feather-river-t
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Sacramento River, Feather River, California, Knights Landing
    Description

    This part of the data release contains a grid of standard deviations of bathymetric soundings within each 0.5 m x 0.5 m grid cell. The bathymetry was collected on February 1, 2011, in the Sacramento River from the confluence of the Feather River to Knights Landing. The standard deviations represent one component of bathymetric uncertainty in the final digital elevation model (DEM), which is also available in this data release. The bathymetry data were collected by the USGS Pacific Coastal and Marine Science Center (PCMSC) team with collaboration and funding from the U.S. Army Corps of Engineers. This project used interferometric sidescan sonar to characterize the riverbed and channel banks along a 12 mile reach of the Sacramento River near the town of Knights Landing, California (River Mile 79 through River Mile 91) to aid in the understanding of fish response to the creation of safe habitat associated with levee restoration efforts in two 1.5 mile reaches of the Sacramento River between River Mile 80 and 86.

  5. n

    Chapter 3 of the Working Group I Contribution to the IPCC Sixth Assessment...

    • data-search.nerc.ac.uk
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Chapter 3 of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure 3.39 (v20220614) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=AR6
    Explore at:
    Dataset updated
    May 16, 2024
    Description

    Data for Figure 3.39 from Chapter 3 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 3.39 shows the observed and simulated Pacific Decadal Variability (PDV). --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Eyring, V., N.P. Gillett, K.M. Achuta Rao, R. Barimalala, M. Barreiro Parrillo, N. Bellouin, C. Cassou, P.J. Durack, Y. Kosaka, S. McGregor, S. Min, O. Morgenstern, and Y. Sun, 2021: Human Influence on the Climate System. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 423–552, doi:10.1017/9781009157896.005. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has six panels. Files are not separated according to the panels. --------------------------------------------------- List of data provided --------------------------------------------------- pdv.obs.nc contains - Observed SST anomalies associated with the PDV pattern - Observed PDV index time series (unfiltered) - Observed PDV index time series (low-pass filtered) - Taylor statistics of the observed PDV patterns - Statistical significance of the observed SST anomalies associated with the PDV pattern pdv.hist.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 historical simulations. pdv.hist.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 historical simulations. pdv.piControl.cmip6.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP6 piControl simulations. pdv.piControl.cmip5.nc contains - Simulated SST anomalies associated with the PDV pattern - Simulated PDV index time series (unfiltered) - Simulated PDV index time series (low-pass filtered) - Taylor statistics of the simulated PDV patterns based on CMIP5 piControl simulations. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: - ipo_pattern_obs_ref in pdv.obs.nc: shading - ipo_pattern_obs_signif (dataset = 1) in pdv.obs.nc: cross markers Panel b: - Multimodel ensemble mean of ipo_model_pattern in pdv.hist.cmip6.nc: shading, with their sign agreement for hatching Panel c: - tay_stats (stat = 0, 1) in pdv.obs.nc: black dots - tay_stats (stat = 0, 1) in pdv.hist.cmip6.nc: red crosses, and their multimodel ensemble mean for the red dot - tay_stats (stat = 0, 1) in pdv.hist.cmip5.nc: blue crosses, and their multimodel ensemble mean for the blue dot Panel d: - Lag-1 autocorrelation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Lag-10 autocorrelation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel e: - Standard deviation of tpi in pdv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of tpi in pdv.hist.cmip6.nc: red filled box-whisker in the left - Standard deviation of tpi_lp in pdv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of tpi_lp in pdv.hist.cmip6.nc: red filled box-whisker in the right Panel f: - tpi_lp in pdv.obs.nc: black curves . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - tpi_lp in pdv.hist.cmip6.nc: 5th-95th percentiles in red shading, multimodel ensemble mean and its 5-95% confidence interval for red curves - tpi_lp in pdv.hist.cmip5.nc: 5th-95th percentiles in blue shading, multimodel ensemble mean for blue curve CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. SST stands for Sea Surface Temperature. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- Multimodel ensemble means and percentiles of historical simulations of CMIP5 and CMIP6 are calculated after weighting individual members with the inverse of the ensemble size of the same model. ensemble_assign in each file provides the model number to which each ensemble member belongs. This weighting does not apply to the sign agreement calculation. piControl simulations from CMIP5 and CMIP6 consist of a single member from each model, so the weighting is not applied. Multimodel ensemble means of the pattern correlation in Taylor statistics in (c) and the autocorrelation of the index in (d) are calculated via Fisher z-transformation and back transformation. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component containing the figure (Chapter 3) - Link to the Supplementary Material for Chapter 3, which contains details on the input data used in Table 3.SM.1 - Link to the code for the figure, archived on Zenodo - Link to the figure on the IPCC AR6 website

  6. b

    Guidelines for Computing Summary Statistics for Data-Sets Containing...

    • datahub.bvcentre.ca
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Guidelines for Computing Summary Statistics for Data-Sets Containing Non-Detects - Dataset - BVRC DataHub [Dataset]. https://datahub.bvcentre.ca/dataset/guidelines-for-computing-summary-statistics-for-data-sets-containing-non-detects
    Explore at:
    Dataset updated
    Jun 3, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    INTRODUCTION As part of its responsibilities, the BC Ministry of Environment monitors water quality in the province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds. Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is necessary to compute statistical measurements such as the mean, the median, and the standard deviation for a data set. The way non-detects are handled can affect the quality of any statistics generated. Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and environmetrics. In such fields, it is often the case that the measurements of interest are below some threshold. Dealing with non-detects is a significant issue and statistical tools using survival or reliability methods have been developed. Basically, there are three approaches for treating data containing censored values: 1. substitution, which gives poor results and therefore, is not recommended in the literature; 2. maximum likelihood estimation, which requires an assumption of some distributional form; and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form. This document provides guidance on how to record censor data, and on when and how to use certain analysis methods when the percentage of censored observations is less than 50%. The methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on order statistics, which is a semiparametric method. Statistical software suitable for survival or reliability analysis is available for dealing with censored data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b). The data used to illustrate the methods described for computing summary statistics for non-detects are either simulated or based on information acquired from the B.C. Ministry of Environment. This document is strongly based on the book Nondetects And Data Analysis written by Dennis R. Helsel in 2005 (Helsel, 2005b).

  7. Depth (Standard Deviation) Layer used to identify, delineate and classify...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Mar 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Commerce (DOC), National Oceanic and Atmospheric Administration (NOAA), National Ocean Service (NOS), Center for Coastal Monitoring and Assessment (CCMA), Biogeography Branch (Point of Contact) (2025). Depth (Standard Deviation) Layer used to identify, delineate and classify moderate-depth benthic habitats around St. John, USVI [Dataset]. https://catalog.data.gov/dataset/depth-standard-deviation-layer-used-to-identify-delineate-and-classify-moderate-depth-benthic-h4
    Explore at:
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    United States Department of Commercehttp://commerce.gov/
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Area covered
    U.S. Virgin Islands, Saint John
    Description

    Standard deviation of depth was calculated from the bathymetry surface for each cell using the ArcGIS Spatial Analyst Focal Statistics "STD" parameter. Standard deviation of depth represents the dispersion of depth values (in meters) around the mean depth within a square 3x3 cell window. The 2x2 meter resolution standard deviation of depth GeoTIFF was exported and added as a new map layer to aid in benthic habitat classification. Acoustic imagery was acquired for the VICRNM on two separate missions onboard the NOAA ship, Nancy Foster. The first mission took place from 2/18/04 to 3/5/04. The second mission took place from 2/1/05 to 2/12/05. On both missions, seafloor depths between 14 to 55 m were mapped using a RESON SeaBat 8101 ER (240 kHz) MBES sensor. This pole-mounted system measured water depths across a 150 degree swath consisting of 101 individual 1.5 degree x 1.5 degree beams. The beams to the port and starboard of nadir (i.e., directly underneath the ship) overlapped adjacent survey lines by approximately 10 m. The vessel survey speed was between 5 and 8 kn. In 2004, the ship's location was determined by a Trimble DSM 132 DGPS system, which provided a RTCM differential data stream from the U.S. Coast Guard Continually Operating Reference Station (CORS) at Port Isabel, Puerto Rico. Gyro, heave, pitch and roll correctors were acquired using an Ixsea Octans gyrocompass. In 2005, the ship's positioning and orientation were determined by the Applanix POS/MV 320 V4, which is a GPS aided Inertial Motion Unit (IMU) providing measurements of roll, pitch and heading. The POS/MV obtained its positions from two dual frequency Trimble Zephyr GPS antennae. An auxiliary Trimble DSM 132 DGPS system provided a RTCM differential data stream from the U.S. Coast Guard CORS at Port Isabel, Puerto Rico. For both years, CTD (conductivity, temperature and depth) measurements were taken approximately every 4 hours using a Seabird Electronics SBE-19 to correct for the changing sound velocities in the water column. In 2004, raw data were logged in .xtf (extended triton format) using Triton ISIS software 6.2. In 2005, raw data were logged in .gsf (generic sensor format) using SAIC ISS 2000 software. Data from 2004 were referenced to the WGS84 UTM 20 N horizontal coordinate system, and data from 2005 were referenced to the NAD83 UTM 20 N horizontal coordinate system. Data from both projects were referenced to the Mean Lower Low Water (MLLW) vertical tidal coordinate system. The 2004 and 2005 MBES bathymetric data were both corrected for sensor offsets, latency, roll, pitch, yaw, static draft, the changing speed of sound in the water column and the influence of tides in CARIS Hips & Sips 5.3 and 5.4, respectively. The 2004 data was then binned to create a 1 x 1 m raster surface, and the 2005 data was binned to a create 2 x 2 m raster surface. After these final surfaces were created, the datum for the 2004 bathymetric surfaces was transformed from WGS84 to NAD83 using the "Project Raster" function in ArcGIS 9.1. The 2004 surface was transformed so that it would have the same datum as the 2005 surface. The 2004 bathymetric surface was then down sampled from 1 x 1 to 2 x 2 m using the "Resample" function in ArcGIS 9.1. The 2004 surface was resampled so it would have the same spatial resolution as the 2005 surface. Having the same coordinate systems and spatial resolutions, the final 2004 and 2005 bathymetry rasters were then merged using the Raster Calculator function "Merge" in ArcGIS's Spatial Analyst Extension to create a seamless bathymetry surface for the entire VICRNM area south of St. John. For a complete description of the data acquisition and processing parameters, please see the data acquisition and processing reports (DAPRs) for projects: NF-04-06-VI and NF-05-05-VI (Monaco & Rooney, 2004; Battista & Lazar, 2005).

  8. f

    Descriptive statistics.

    • plos.figshare.com
    xls
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    PLOS Global Public Health
    Authors
    Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.

  9. Z

    #PraCegoVer dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Institute of Computing, University of Campinas
    Authors
    Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  10. ISLR Hands Only Sign Recognition Dataset

    • kaggle.com
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cris :) (2024). ISLR Hands Only Sign Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/cristaliss/islr-hands-only-dataset
    Explore at:
    zip(48782493 bytes)Available download formats
    Dataset updated
    Jan 19, 2024
    Authors
    Cris :)
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Created with this notebook, contains sign language data from Isolated Sign Language Recognition competition.

    There were four types of landmarks in the original dataset: 1. Face landmarks 2. Pose landmarks 3. Right hand landmarks 4. Left hand landmarks

    However, due to face and pose landmarks hoard majority of data, we drop these two types of points. In order to achieve a faster training we just save hands data.

    To create the final dataset, we grouped by type of landmark and index and the calculate standard deviation for each landmark.

    The whole dataset correspond to x,y,z std for each landmark of each sequence of movement parquet file from original data.

    FINAL SIZE X shape : (94477, 42, 3) y shape : (94477,)

  11. Required number of samples per group (radiosensitive and non-radiosensitive,...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burkhard Greve; Tobias Bölling; Susanne Amler; Ute Rössler; Maria Gomolka; Claudia Mayer; Odilia Popanda; Kristin Dreffke; Astrid Rickinger; Eberhard Fritz; Friederike Eckardt-Schupp; Christina Sauerland; Herbert Braselmann; Wiebke Sauter; Thomas Illig; Dorothea Riesenbeck; Stefan Könemann; Normann Willich; Simone Mörtl; Hans Theodor Eich; Peter Schmezer (2023). Required number of samples per group (radiosensitive and non-radiosensitive, respectively) to detect a significant difference between both groups (power  = 80%, significance level  = 5%) for a given standard deviation and effect size. [Dataset]. http://doi.org/10.1371/journal.pone.0047185.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Burkhard Greve; Tobias Bölling; Susanne Amler; Ute Rössler; Maria Gomolka; Claudia Mayer; Odilia Popanda; Kristin Dreffke; Astrid Rickinger; Eberhard Fritz; Friederike Eckardt-Schupp; Christina Sauerland; Herbert Braselmann; Wiebke Sauter; Thomas Illig; Dorothea Riesenbeck; Stefan Könemann; Normann Willich; Simone Mörtl; Hans Theodor Eich; Peter Schmezer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The calculations were reduced by the 2 endpoints of repairing after 15 and 60 minutes of 5Gy irradiation, since these points of time appear to need the lowest sample sizes. With regard to the standard deviation the minimal and maximal values from the original dataset for the two timeframes were selected.

  12. Chlorophyll-a Standard Deviation of Long-Term Mean, 1998-2018 - American...

    • catalog.data.gov
    • data.ioos.us
    • +1more
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA Pacific Islands Fisheries Science Center (PIFSC) (Point of Contact) (2024). Chlorophyll-a Standard Deviation of Long-Term Mean, 1998-2018 - American Samoa [Dataset]. https://catalog.data.gov/dataset/chlorophyll-a-standard-deviation-of-long-term-mean-1998-2018-american-samoa
    Explore at:
    Dataset updated
    Dec 27, 2024
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Area covered
    American Samoa
    Description

    Chlorophyll-a, is a widely used proxy for phytoplankton biomass and an indicator for changes in phytoplankton production. As an essential source of energy in the marine environment, the extent and availability of phytoplankton biomass can be highly influential for fisheries production and dictate trophic structure in marine ecosystems. Changes in phytoplankton biomass are predominantly effected by changes in nutrient availability, through either natural (e.g., turbulent ocean mixing) or anthropogenic (e.g., agricultural runoff) processes. This layer represents the standard deviation of the 8-day time series of chlorophyll-a (mg/m3) from 1998-2018. Data products generated by the Ocean Colour component of the European Space Agency (ESA) Climate Change Initiative (CCI) project. These files are 8-day 4-km composites of merged sensor products: Global Area Coverage (GAC), Local Area Coverage (LAC), MEdium Resolution Imaging Spectrometer (MERIS), Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua, Ocean and Land Colour Instrument (OLCI), Sea-viewing Wide Field-of-view Sensor (SeaWiFS), and Visible Infrared Imaging Radiometer Suite (VIIRS). The standard deviation was calculated over all 8-day chlorophyll-a data from 1998-2018 for each pixel. A quality control mask was applied to remove spurious data associated with shallow water, following Gove et al., 2013. Nearshore map pixels with no data were filled with values from the nearest neighboring valid offshore pixel by using a grid of points and the Near Analysis tool in ArcGIS then converting points to raster. Data source: https://oceanwatch.pifsc.noaa.gov/erddap/griddap/esa-cci-chla-8d-v5-0.graph

  13. f

    Data from: S1 Data set -

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ho, Jau-Der; Yeh, Jong-Shiuan; Hsueh, Chun-Mei (2023). S1 Data set - [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001116616
    Explore at:
    Dataset updated
    Jun 23, 2023
    Authors
    Ho, Jau-Der; Yeh, Jong-Shiuan; Hsueh, Chun-Mei
    Description

    PurposeIdentify risk factors of progression in treated normal-tension glaucoma (NTG) in highly myopic and non-highly myopic eyes.MethodsThis retrospective, observational case series study included 42 highly myopic glaucoma (HMG, <-6D) eyes and 39 non-highly myopic glaucoma (NHG,≧-6D) eyes. Glaucoma progression was determined by serial visual field data. Univariate and multivariate logistic regression method were used to detect associations between potential risk factors and glaucoma progression.ResultsAmong 81 eyes from 81 normal-tension glaucoma patients (mean follow-up, 3.10 years), 20 of 42 eye (45.24%) in the HMG and 14 of 39 eyes (35.90%) in the NHG showed progression. The HMG group had larger optic disc tilt ratio (p = 0.007) and thinner inferior macular thickness (P = 0.03) than the NHG group. Changes in the linear regression values for MD for each group were as follows: -0.652 dB/year for the HMG and -0.717 dB/year for the NHG (P = 0.298). Basal pattern standard deviation (PSD) (OR: 1.55, p = 0.016) and post treatment IOP (OR = 1.54, p = 0.043) were risk factors for visual field progression in normal tension glaucoma patients. In subgroup analysis of HMG patients, PSD (OR: 2.77, p = 0.017) was a risk factor for visual field progression.ConclusionReduction IOP was postulated to be contributing in the prevention of visual field progression, especially in highly myopic NTG patients with large basal pattern standard deviation.

  14. Data Set of Extracted Summary Statistics from Equipment Sensor Data

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jan 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). Data Set of Extracted Summary Statistics from Equipment Sensor Data [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4462777?locale=en
    Explore at:
    unknown(149706)Available download formats
    Dataset updated
    Jan 24, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set was generated in accordance with the semiconductor industry and contains values of summary statistics from sensor recordings of the high-precision and high-tech production equipment. Basically, the semiconductor production consists of hundreds of process steps performing physical and chemical operations on so-called wafers, i.e. slices based on semiconductor material. In the production chain, each process equipment is equipped with several sensors recording physical parameters like gas flow, temperature, voltage, etc., resulting in so-called sensor data. Out of the sensor data, values of summary statistics are extracted. These are values like mean, standard deviation and gradients. To keep the entire production as stable as possible, these values are used to monitor the whole production in order to intervene in case of deviations. After the production, each device on the wafer is tested in the most careful way resulting in so-called wafer test data. In some cases, suspicious patterns occur in the wafer test data potentially leading to failure. In this case the root cause must be found in the production chain. For this purpose, the given data is provided. The aim is to find correlations between the wafer test data and the values of summary statistics in order to identify the root cause. The given data is divided into four data sets: "XTrain.csv", "YTrain.csv", "XTest.csv" and "YTest.csv". "XTrain.csv" and "XTest.csv" represent the values of summary statistics originating in the production chain separated for the purpose of training and validating a statistical model. Included are 114 observations of 77 parameters (values of summary statistics). The "YTrain.csv" and "YTest.csv" contain the corresponding wafer test data (144 observations of one parameter).

  15. Multiscale Land Surface Parameters of GEDTM30: Spherical Standard Deviation...

    • data.europa.eu
    unknown
    Updated Jan 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). Multiscale Land Surface Parameters of GEDTM30: Spherical Standard Deviation of the Normals [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14920383?locale=el
    Explore at:
    unknown(234449)Available download formats
    Dataset updated
    Jan 23, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spherical Standard Deviation of the Normals This data is part of the Global Ensemble Digital Terrain Model (GEDTM30) dataset. Check the related identifiers section below to access other parts of the dataset. Disclaimer This is the first release of the Multiscale Land Surface Parameters (LSPs) of Global Ensemble Digital Terrain Model (GEDTM30). Use for testing purposes only. This work was funded by the European Union. However, the views and opinions expressed are solely those of the author(s) and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them. The data is provided "as is." The Open-Earth-Monitor project consortium, along with its suppliers and licensors, hereby disclaims all warranties of any kind, express or implied, including, without limitation, warranties of merchantability, fitness for a particular purpose, and non-infringement. Neither the Open-Earth-Monitor project consortium nor its suppliers and licensors make any warranty that the website will be error-free or that access to it will be continuous or uninterrupted. You understand that you download or otherwise obtain content or services from the website at your own discretion and risk. Description LSPs are derivative products of the GEDTM30 that represent measures of local topographic position, curvature, hydrology, light, and shadow. A pyramid representation is implemented to generate multiscale resolutions of 30m, 60m, 120m, 240m, 480m, and 960m for each LSP. The parametrization is powered by Whitebox Workflows in Python. To see the documentation, please visit our GEDTM30 GitHub (https://github.com/openlandmap/GEDTM30). Dataset Contents This dataset includes: Global Spherical Standard Deviation of the Normals 120m Global Spherical Standard Deviation of the Normals 240m Global Spherical Standard Deviation of the Normals 480m Global Spherical Standard Deviation of the Normals 960m Due to Zenodo's storage limitations, the high resolution LSP data are provided via external links: Global Spherical Standard Deviation of the Normals 30m Global Spherical Standard Deviation of the Normals 60m Related Identifiers Digital Terrain Model: GEDTM30 Landform: Slope in Degree, Geomorphons Light and Shadow: Positive Openness, Negative Openness, Hillshade Curvature: Minimal Curvature, Maximal Curvature, Profile Curvature, Tangential Curvature, Ring Curvature, Shape Index Local Topographic Position: Difference from Mean Elevation, Spherical Standard Deviation of the Normals Hydrology: Specific Catchment Area, LS Factor, Topographic Wetness Index Data Details Time period: static. Type of data: properties derived from Digital Terrain Model How the data was collected or derived: The data was derived using Whitbox Workflows. Methods used: LSP algorithms. Limitations or exclusions in the data: The dataset does not include data Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -65, 180, 85) Spatial resolution: 120m, 240m, 480m, 960m Image size: 360,000P x 178,219L; 180,000P x 89,110L; 45,000L x 22,282L File format: Cloud Optimized Geotiff (COG) format. Additional information: Layer Scale Data Type No Data Difference from Mean Elevation 100 Int16 32,767 Geomorphons 1 Byte 255 Hillshade 1 UInt16 65,535 LS Factor 1,000 UInt16 65,535 Maximal Curvature 1,000 Int16 32,767 Minimal Curvature 1,000 Int16 32,767 Negative Openness 100 UInt16 65,535 Positive Openness 100 UInt16 65,535 Profile Curvature 1,000 Int16 32,767 Ring Curvature 10,000 Int16 32,767 Shape Index 1,000 Int16 32,767 Slope in Degree 100 UInt16 65,535 Specific Catchment Area 1,000 UInt16 65,535 Spherical Standard Deviation of the Normals 100 Int16 32,767 Tangential Curvature 1,000 Int16 32,767 Topographic Wetness Index 100 Int16 32,767 Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for twi_edtm_m_120m_s_20000101_20221231_go_epsg.4326_v20241230.tif, the fields are: generic variable name: twi = topographic wetness index variable procedure combination: edtm = derivative direct from global ensemble digital terrain model Position in the probability distribution/variable type: m = measurement Spatial support: 120m Depth reference: s = surface Time reference begin time: 20000101 = 2000-01-01 Time reference end time: 20211231 = 2021-12-31 Bounding box: go = global EPSG code: EPSG:4326 Version code: v20241230 = version from 2024-12-30

  16. w

    TRMM Precipitation Radar (PR) Level 2 Surface Cross-Section Product (TRMM...

    • data.wu.ac.at
    bin
    Updated Jun 19, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2015). TRMM Precipitation Radar (PR) Level 2 Surface Cross-Section Product (TRMM Product 2A21) V7 [Dataset]. https://data.wu.ac.at/schema/data_gov/NTQ4NzlkM2ItZmVjZi00YTg4LWE5ZWQtN2U4OWRmMmFhMWFk
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 19, 2015
    Dataset provided by
    National Aeronautics and Space Administration
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    e8dfc5ee9c05c281069d6c28f70c8f5cb6b968a0
    Description

    The Tropical Rainfall Measuring Mission (TRMM) is a joint U.S.-Japan satellite mission to monitor tropical and subtropical precipitation and to estimate its associated latent heating.

    The primary objective of the 2A21 is to compute the path integrated attenuation (PIA), using the surface reference technique (SRT). The SRT relies on the assumption that the difference between the measurements of the normalized surface cross section within and outside the rain provides a measure of the PIA.

    Two types of non-rain surface cross section (sigma-zero) reference estimates are used: spatial and temporal. In the spatial surface reference data set, the mean and standard deviation of the surface cross sections are calculated over a running window of Ns fields of view before rain is encountered. These operations are performed separately for each of the 49+2 incidence angles of TRMM (corresponding to the cross-track scan from -17 degrees to + 17 degrees with respect to nadir). The two additional angle bins (making the total 51 rather than 49) are to account for non-zero pitch/roll angles that can shift the incidence angle with respect to nadir outside the normal range.

    For the temporal surface reference data set, the running mean and standard deviation are computed over a 1 degree x 1 degree (latitude, longitude) grid. Within each 1 degree x 1 degree grid cell, the data are further categorized into incidence angle categories (26). The number of observations in each category, Nt, are also recorded. Note that, for the temporal reference data set, no distinction is made between the port and starboard incidence angles. So, instead of 49 incidence angles, there are only 25 + 1, where the additional bin corresponds to angles greater than the normal range.

    When rain is encountered, the mean and standard deviations of the reference sigma-zero values are retrieved from the spatial and temporal surface reference data sets. To determine which reference measurement is to be used, the algorithm checks whether Nt >= Ntmin and Ns >= Nsmin, where Ntmin and Nsmin are the minimum number of samples that are needed to be considered a valid reference estimate for the temporal and spatial reference data sets, respectively. (Presently, Ntmin = 50 and Nsmin = 8). If neither condition is satisfied, no estimate of the PIA is made and the flags are set accordingly. If only one condition is met, then the surface reference data which corresponds to this is used. If both conditions are satisfied, the surface reference data is taken from that set which has the smaller standard deviation.

    If a valid surface reference data set exists (i.e., either Nt >= Ntmin or Ns >= Nsmin or both) then the 2-way path attenuation (PIA) is estimated from the equation:

    PIA =

    where sigma-zero(in rain) is the value of the surface cross section over the rain volume of interest and

    To obtain information as to the reliability of this PIA estimate we consider the difference between the PIA, as derived in the above equation, and the standard deviation as calculated from the no-rain sigma-zero values and stored in the reference data set. Labeling this as std dev(reference value), then the reliability factor of the PIA estimate is obtained from:

    reliabFactor = PIA - std dev(reference value)

    When this quantity is large, the reliability is considered high and conversely. This is the basic...

  17. n

    Chapter 3 of the Working Group I Contribution to the IPCC Sixth Assessment...

    • data-search.nerc.ac.uk
    Updated Apr 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Chapter 3 of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure 3.40 (v20220614) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=modes%20of%20variability
    Explore at:
    Dataset updated
    Apr 24, 2024
    Description

    Data for Figure 3.40 from Chapter 3 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 3.40 shows the observed and simulated Atlantic Multidecadal Variability (AMV). --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Eyring, V., N.P. Gillett, K.M. Achuta Rao, R. Barimalala, M. Barreiro Parrillo, N. Bellouin, C. Cassou, P.J. Durack, Y. Kosaka, S. McGregor, S. Min, O. Morgenstern, and Y. Sun, 2021: Human Influence on the Climate System. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 423–552, doi:10.1017/9781009157896.005. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has six panels. Files are not separated according to the panels. --------------------------------------------------- List of data provided --------------------------------------------------- amv.obs.nc contains - Observed SST anomalies associated with the AMV pattern - Observed AMV index time series (unfiltered) - Observed AMV index time series (low-pass filtered) - Taylor statistics of the observed AMV patterns amv.hist.cmip6.nc contains - Statistical significance of the observed SST anomalies associated with the AMV pattern - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP6 historical simulations. amv.hist.cmip5.nc contains - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP5 historical simulations. amv.piControl.cmip6.nc contains - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP6 piControl simulations. amv.piControl.cmip5.nc contains - Simulated SST anomalies associated with the AMV pattern - Simulated AMV index time series (unfiltered) - Simulated AMV index time series (low-pass filtered) - Taylor statistics of the simulated AMV patterns based on CMIP5 piControl simulations. --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel a: - amv_pattern_obs_ref in amv.obs.nc: shading - amv_pattern_obs_signif (dataset = 1) in amv.obs.nc: cross markers Panel b: - Multimodel ensemble mean of amv_pattern in amv.hist.cmip6.nc: shading, with their sign agreement for hatching Panel c: - tay_stats (stat = 0, 1) in amv.obs.nc: black dots - tay_stats (stat = 0, 1) in amv.hist.cmip6.nc: red crosses, and their multimodel ensemble mean for the red dot - tay_stats (stat = 0, 1) in amv.hist.cmip5.nc: blue crosses, and their multimodel ensemble mean for the blue dot Panel d: - Lag-1 autocorrelation of amv_timeseries_raw in amv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of lag-1 autocorrelation of amv_timeseries_raw in amv.hist.cmip6.nc: red filled box-whisker in the left - Lag-10 autocorrelation of amv_timeseries in amv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of lag-10 autocorrelation of amv_timeseries in amv.hist.cmip6.nc: red filled box-whisker in the right Panel e: - Standard deviation of amv_timeseries_raw in amv.obs.nc: black horizontal lines in left . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.piControl.cmip5.nc: blue open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.piControl.cmip6.nc: red open box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.hist.cmip5.nc: blue filled box-whisker in the left - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries_raw in amv.hist.cmip6.nc: red filled box-whisker in the left - Standard deviation of amv_timeseries in amv.obs.nc: black horizontal lines in right . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.piControl.cmip5.nc: blue open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.piControl.cmip6.nc: red open box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.hist.cmip5.nc: blue filled box-whisker in the right - Multimodel ensemble mean and percentiles of standard deviation of amv_timeseries in amv.hist.cmip6.nc: red filled box-whisker in the right Panel f: - amv_timeseries in amv.obs.nc: black curves . ERSSTv5: dataset = 1 . HadISST: dataset = 2 . COBE-SST2: dataset = 3 - amv_timeseries in amv.hist.cmip6.nc: 5th-95th percentiles in red shading, multimodel ensemble mean and its 5-95% confidence interval for red curves - amv_timeseries in amv.hist.cmip5.nc: 5th-95th percentiles in blue shading, multimodel ensemble mean for blue curve CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. SST stands for Sea Surface Temperature. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- Multimodel ensemble means and percentiles of historical simulations of CMIP5 and CMIP6 are calculated after weighting individual members with the inverse of the ensemble size of the same model. ensemble_assign in each file provides the model number to which each ensemble member belongs. This weighting does not apply to the sign agreement calculation. piControl simulations from CMIP5 and CMIP6 consist of a single member from each model, so the weighting is not applied. Multimodel ensemble means of the pattern correlation in Taylor statistics in (c) and the autocorrelation of the index in (d) are calculated via Fisher z-transformation and back transformation. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component containing the figure (Chapter 3) - Link to the Supplementary Material for Chapter 3, which contains details on the input data used in Table 3.SM.1 - Link to the code for the figure, archived on Zenodo - Link to the figure on the IPCC AR6 website

  18. o

    University SET data, with faculty and courses characteristics

    • openicpsr.org
    Updated Sep 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
    Explore at:
    Dataset updated
    Sep 12, 2021
    Authors
    Under blind review in refereed journal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

  19. Chapter 6 of the Working Group I Contribution to the IPCC Sixth Assessment...

    • catalogue.ceda.ac.uk
    Updated May 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Turnock; Sophie Szopa; Vaishali Naik; Rita Van Dingenen (2023). Chapter 6 of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure 6.SM.4 (v20220930) [Dataset]. https://catalogue.ceda.ac.uk/uuid/56c283c79666449ebe0235e809bdb69f
    Explore at:
    Dataset updated
    May 16, 2023
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Steven Turnock; Sophie Szopa; Vaishali Naik; Rita Van Dingenen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2015 - Dec 31, 2100
    Area covered
    Earth
    Description

    Data for Figure 6.SM.4 from Chapter 6 Supplementary Material of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Figure 6.SM.4 shows future global and regional changes in annual mean surface PM2.5, relative to the 2005-2014 mean, for the different SSPs used in CMIP6. Each line represents a multi-model mean across the region with shading representing the ±1 standard deviation in the mean.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Szopa, S., V. Naik, B. Adhikary, P. Artaxo, T. Berntsen, W.D. Collins, S. Fuzzi, L. Gallardo, A. Kiendler-Scharr, Z. Klimont, H. Liao, N. Unger, and P. Zanis, 2021: Short-Lived Climate Forcers Supplementary Material. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Available from https://www.ipcc.ch/

    Figure subpanels

    The figure has 11 panels, with data provided for all panels in 4 files placed in the main directory.

    List of data provided

    This dataset contains precomputed values of surface PM2.5 concentrations across world regions for: - A 10-year mean period (2005 to 2014) from the historical simulation to represent present day regional mean values. Regional multi-model annual mean and standard deviation values are calculated across 5 different CMIP6 models - Annual 5-year multi-model mean values of surface PM2.5 from 5 different CMIP6 models projected for 7 different future scenarios covering the period 2015 to 2100 - Standard deviation values of surface PM2.5 from 5 different CMIP6 models projected for 7 different future scenarios covering 5-year mean periods from 2015 to 2100 - Annual 10-year mean values of surface PM2.5 from the TM5-FASST model projected for 5 different future scenarios covering the period 2015 to 2100

    Data provided in relation to figure

    All the data files provided are used to create the time series plots for each region. The numbers in each panel for each region are obtained from 'Surf_PM2pt5_data_05_14_mean_for_IPCC_figure_V1_5mods.csv', with the time series line for each scenario from 'Surf_PM2pt5_data_fut_mean_for_IPCC_figure_V1_5mods.csv' and the shading obtained by using the values in 'Surf_PM2pt5_SD_data_fut_mean_for_IPCC_figure_V1_5mods.csv'. The TM5-FASST data is included on the figure by reading in pre-computed regional mean values from the 'Regional_annual_mean_surface_PM2pt5_resp_values_CMIP6_Fut_Scens_from_TM5_FASST_on_AR6_reg_receptors_INCL_GLOB_2015_2100.txt' file.

    CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. PM2.5 refers to fine particulate matter air pollution with diameter of less than 2.5 microns. SSP stands for Shared Socioeconomic Pathway.

    Notes on reproducing the figure from the provided data

    The plotting code that is provided along with this dataset should just be able to read in each of the precomputed regional mean .csv files and then reproduce the time series figures.

    Sources of additional information

    The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the report component related to the figure (Chapter 6) - Link to the Supplementary Material for Chapter 6, which contains details on the input data used in Table 6.SM.3 - Link to the code for the figure, archived on Zenodo.

  20. Study Hours vs Grades Dataset

    • kaggle.com
    zip
    Updated Oct 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Silva (2025). Study Hours vs Grades Dataset [Dataset]. https://www.kaggle.com/datasets/andreylss/study-hours-vs-grades-dataset
    Explore at:
    zip(33964 bytes)Available download formats
    Dataset updated
    Oct 12, 2025
    Authors
    Andrey Silva
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This synthetic dataset contains 5,000 student records exploring the relationship between study hours and academic performance.

    Dataset Features

    • student_id: Unique identifier for each student (1-5000)
    • study_hours: Hours spent studying (0-12 hours, continuous)
    • grade: Final exam score (0-100 points, continuous)

    Potential Use Cases

    • Linear regression modeling and practice
    • Data visualization exercises
    • Statistical analysis tutorials
    • Machine learning for beginners
    • Educational research simulations

    Data Quality

    • No missing values
    • Normally distributed residuals
    • Realistic educational scenario
    • Ready for immediate analysis

    Data Generation Code

    This dataset was generated using R.

    R Code

    # Set seed for reproducibility
    set.seed(42)
    
    # Define number of observations (students)
    n <- 5000
    
    # Generate study hours (independent variable)
    # Uniform distribution between 0 and 12 hours
    study_hours <- runif(n, min = 0, max = 12)
    
    # Create relationship between study hours and grade
    # Base grade: 40 points
    # Each study hour adds an average of 5 points
    # Add normal noise (standard deviation = 10)
    theoretical_grade <- 40 + 5 * study_hours
    
    # Add normal noise to make it realistic
    noise <- rnorm(n, mean = 0, sd = 10)
    
    # Calculate final grade
    grade <- theoretical_grade + noise
    
    # Limit grades between 0 and 100
    grade <- pmin(pmax(grade, 0), 100)
    
    # Create the dataframe
    dataset <- data.frame(
     student_id = 1:n,
     study_hours = round(study_hours, 2),
     grade = round(grade, 2)
    )
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact) (2025). Sea Surface Temperature (SST) Standard Deviation of Long-term Mean, 2000-2013 - Hawaii [Dataset]. https://catalog.data.gov/dataset/sea-surface-temperature-sst-standard-deviation-of-long-term-mean-2000-2013-hawaii

Sea Surface Temperature (SST) Standard Deviation of Long-term Mean, 2000-2013 - Hawaii

Explore at:
Dataset updated
Jan 27, 2025
Dataset provided by
National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact)
Area covered
Hawaii
Description

Sea surface temperature (SST) plays an important role in a number of ecological processes and can vary over a wide range of time scales, from daily to decadal changes. SST influences primary production, species migration patterns, and coral health. If temperatures are anomalously warm for extended periods of time, drastic changes in the surrounding ecosystem can result, including harmful effects such as coral bleaching. This layer represents the standard deviation of SST (degrees Celsius) of the weekly time series from 2000-2013. Three SST datasets were combined to provide continuous coverage from 1985-2013. The concatenation applies bias adjustment derived from linear regression to the overlap periods of datasets, with the final representation matching the 0.05-degree (~5-km) near real-time SST product. First, a weekly composite, gap-filled SST dataset from the NOAA Pathfinder v5.2 SST 1/24-degree (~4-km), daily dataset (a NOAA Climate Data Record) for each location was produced following Heron et al. (2010) for January 1985 to December 2012. Next, weekly composite SST data from the NOAA/NESDIS/STAR Blended SST 0.1-degree (~11-km), daily dataset was produced for February 2009 to October 2013. Finally, a weekly composite SST dataset from the NOAA/NESDIS/STAR Blended SST 0.05-degree (~5-km), daily dataset was produced for March 2012 to December 2013. The standard deviation of the long-term mean SST was calculated by taking the standard deviation over all weekly data from 2000-2013 for each pixel.

Search
Clear search
Close search
Google apps
Main menu