100+ datasets found

q
MATLAB code and output files for integral, mean and covariance of the...
researchdatafinder.qut.edu.au
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Matthew Adams (2022). MATLAB code and output files for integral, mean and covariance of the simplex-truncated multivariate normal distribution [Dataset]. https://researchdatafinder.qut.edu.au/display/n20044
Explore at:
Dataset updated
Jul 25, 2022
Dataset provided by
Queensland University of Technology (QUT)
Authors
Dr Matthew Adams
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political science. If these data would otherwise be normally distributed, their spread can be conveniently represented by a multivariate normal distribution truncated to the non-negative space under a unit simplex. Here this distribution is called the simplex-truncated multivariate normal distribution. For calculations on truncated distributions, it is often useful to obtain rapid estimates of their integral, mean and covariance; these quantities characterising the truncated distribution will generally possess different values to the corresponding non-truncated distribution.

In the paper Adams, Matthew (2022) Integral, mean and covariance of the simplex-truncated multivariate normal distribution. PLoS One, 17(7), Article number: e0272014. https://eprints.qut.edu.au/233964/, three different approaches that can estimate the integral, mean and covariance of any simplex-truncated multivariate normal distribution are described and compared. These three approaches are (1) naive rejection sampling, (2) a method described by Gessner et al. that unifies subset simulation and the Holmes-Diaconis-Ross algorithm with an analytical version of elliptical slice sampling, and (3) a semi-analytical method that expresses the integral, mean and covariance in terms of integrals of hyperrectangularly-truncated multivariate normal distributions, the latter of which are readily computed in modern mathematical and statistical packages. Strong agreement is demonstrated between all three approaches, but the most computationally efficient approach depends strongly both on implementation details and the dimension of the simplex-truncated multivariate normal distribution.

This dataset consists of all code and results for the associated article.
ERA5 monthly averaged data on single levels from 1940 to present
cds.climate.copernicus.eu
cds-test-cci2.copernicus-climate.eu
grib
Updated Jul 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 monthly averaged data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.f17050d7
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.f17050d7
Dataset updated
Jul 6, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
Time period covered
Jan 1, 1940 - Jun 1, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on single levels from 1940 to present".
What do we mean by "data" in the arts and humanities? Interview transcripts...
zenodo.org
txt, zip
Updated Jul 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bianca Gualandi; Bianca Gualandi; Luca Pareschi; Luca Pareschi; Silvio Peroni; Silvio Peroni (2022). What do we mean by "data" in the arts and humanities? Interview transcripts (University of Bologna, FICLIT) and qualitative data coding [Dataset]. http://doi.org/10.5281/zenodo.6123290
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6123290
Dataset updated
Jul 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bianca Gualandi; Bianca Gualandi; Luca Pareschi; Luca Pareschi; Silvio Peroni; Silvio Peroni
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the anonymised transcripts of the interviews conducted between November and December 2021 at the department of Classical Philology and Italian Studies (FICLIT) at the University of Bologna. It further includes the qualitative data analysis of the interviews, carried out using a grounded theory approach and the open source software QualCoder version 2.9.
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Climate.gov Data Snapshots: Projections - Average Mean Temperature,...
datalumos.org
Updated Jun 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2025). Climate.gov Data Snapshots: Projections - Average Mean Temperature, Stabilized Emissions [Dataset]. http://doi.org/10.3886/E233263V1
Explore at:
Unique identifier
https://doi.org/10.3886/E233263V1
Dataset updated
Jun 17, 2025
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Q: What average temperatures are projected for the future if we reduce and stabilize global emissions of heat-trapping gases within the next two decades? A: Colors show projected daily average temperature for each month from the 2020s through the 2090s, based on a stabilized-emissions future. In this case, the stabilized-emissions future represents a specific Representative Concentration Pathway (RCP) called RCP 4.5. Learn more about RCPs » « Go back to the Data Snapshots interface Q: Where do these measurements come from? A: Temperature projections in these images represent output from 32 global climate models that are all part of the Coupled Model Intercomparison Project Phase 5 (CMIP5). Projections labeled as “Stabilized emissions” represent a potential future in which global emissions peak around 2040, and then are reduced and stabilized. By 2100, the result of this pathway is climate forcing of 4.5 Watts per square meter at the top of the atmosphere. Based on the energy imbalance along this pathway, global climate models calculate temperature across Earth’s surface for future periods. The RCP 4.5 scenario is associated with warming of approximately 2°C above the modern climate normal. To produce regionally relevant projections, results from the global models were statistically downscaled using a method called Localized Constructed Analogs (LOCA). This technique uses observed local-scale weather and climate information to increase the spatial resolution of global-scale projections, and corrects for bias in the model simulations. Images of long-term averages from 1981 to 2010 (PRISM normals) show recent conditions; these maps provide a baseline for comparison with future projections. To produce the normals data, the PRISM group at Oregon State University gathered temperature and precipitation records from a range of federal, state, and international weather station networks, and then mapped them to a grid. To fill map areas between observation stations, the group used a digital elevation model as a predictor grid, and refined the model to account for local effects of mountains, distance from coasts, and other factors that affect climate in complex terrains. Q: What do the colors mean? A: Shades of blue show where average maximum temperature for the month was, or is projected to be, below 60°F during the period indicated. The darker the shade of blue, the lower the temperature. Areas shown in shades of orange and red had, or are projected to have, average maximum temperatures over 60°F. The darker the shade of orange or red, the higher the temperature. White or very light colors show where the average maximum temperature was, or is projected to be, near 60°F. Q: Why do these data matter? A: In order to meet future needs for energy, food, and public health, planners and other decision makers need to understand how temperatures are projected to change over the coming decades. As the climate system continues responding to the heat-trapping gases we have added to the atmosphere, temperatures will change at different rates in different regions. These images can help people get a sense of how much warming their region will experience each decade so they can plan ahead for new conditions. These data also provide people with a way to compare conditions projected for stabilized emissions with conditions projected for high emissions. Comparing the two potential futures may encourage people to take actions to reduce emissions. Q: How did you produce these snapshots? A: We used a suite of Python scripts to process and visualize LOCA (Localized Constructed Analogs) data. The processing scripts averaged the daily values for each month in a given decade from all 32 global climate models that comprise the LOCA dataset. We then calculated the median of all models in each month of the decade. The visualization scripts produced maps of the results within the contiguous United States. For further information, see the README file or access the scripts on GitHu
f
Dataset for: A generalized partially linear mean-covariance regression model...
wiley.figshare.com
text/x-tex
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xueying Zheng; Guoyou Qin; Dongsheng Tu (2023). Dataset for: A generalized partially linear mean-covariance regression model for longitudinal proportional data, with applications to the analysis of quality of life data from cancer clinical trials [Dataset]. http://doi.org/10.6084/m9.figshare.4880756.v1
Explore at:
text/x-texAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4880756.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Xueying Zheng; Guoyou Qin; Dongsheng Tu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial which motivated this study. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated responses.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
f
Dataset for: Some Remarks on the R2 for Clustering
wiley.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicola Loperfido; Thaddeus Tarpey (2023). Dataset for: Some Remarks on the R2 for Clustering [Dataset]. http://doi.org/10.6084/m9.figshare.6124508.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6124508.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Nicola Loperfido; Thaddeus Tarpey
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.
Climate.gov Data Snapshots: SST - Sea Surface Temperature
datalumos.org
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2025). Climate.gov Data Snapshots: SST - Sea Surface Temperature [Dataset]. http://doi.org/10.3886/E233444V1
Explore at:
Unique identifier
https://doi.org/10.3886/E233444V1
Dataset updated
Jun 18, 2025
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Global
Description
Q: What's the temperature of water at the ocean's surface? A: Colors on the map show the temperature of water right at the ocean’s surface. The darkest blue shows the coldest water: floating sea ice is usually present in these areas. Lighter shades of blue show temperatures of up to 80°F. White and orange areas show where surface temperatures are higher than 80°F, warm enough to fuel tropical cyclones or hurricanes. Q: Where do these measurements come from? A: Satellite instruments measure sea surface temperature—often abbreviated as SST—by checking how much energy comes off the ocean at different wavelengths. Computer programs merge sea surface temperatures from ships and buoys with the satellite data, and incorporate information from maps of sea ice. To produce the daily maps, programs invoke mathematical filters to combine and smooth data from all three sources. Q: What do the colors mean? A: The darkest blue areas show sea surface temperatures as low as 28°F. Sea ice, which can look like anything from a slushy mix of floating ice crystals to a solid surface of white, is usually present in these areas. Progressively lighter shades of blue show increasingly warmer temperatures, up to 80°F. White and orange areas on the map show where the surface temperature is above 80°F. Tropical storms that cross these areas can strengthen to form cyclones and hurricanes. Q: Why do these data matter? A: While heat energy is stored and mixed throughout the depth of the ocean, the temperature of water right at the sea's surface—where the ocean is in direct contact with the atmosphere—plays a significant role in weather and short-term climate. Where sea surface temperatures are high, relatively large amounts of heat energy and moisture enter the atmosphere, sometimes producing powerful, drenching storms downwind. Conversely, lower sea surface temperatures mean less evaporation. Global patterns of sea surface temperatures are an important factor for weather forecasts and climate outlooks. Q: How did you produce these snapshots? A: Data Snapshots are derivatives of existing data products: to meet the needs of a broad audience, we present the source data in a simplified visual style. NOAA's Climate Data Records Program produces the Opitimum Interpolated Sea Surface Temperature files. To produce our images, we run a set of scripts that access the source files, re-project them into desired projections at various sizes, and output them with a custom color bar. Additional information Various scientific groups have produced datasets showing Sea Surface Temperature. The images in Data Snapshots represent the AVHRR-only 1/4° daily OISST dataset. Data snapshots presents just one daily OISST image every seven days References Optimum Interpolation Sea Surface Temperature Technical Notes [pdf] Climate Data Record (CDR) Program Climate Algorithm Theoretical Basis Document (C-ATBD) Daily 1/4° Optimum Interpolation Sea Surface Temperature (OISST) Richard W. Reynolds, Thomas M. Smith, Chunying Liu, Dudley B. Chelton, Kenneth S. Casey, and Michael G. Schlax, 2007: Daily High-Resolution-Blended Analyses for Sea Surface Temperature. J. Climate, 20, 5473–5496. doi: http://dx.doi.org/10.1175/2007JCLI1824.1 Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) Version 2.1 About Optimum Interpolation Sea Surface Temperature (OISST) v2.1 Source: https://www.climate.gov/maps-data/data-snapshots/data-source/sst-sea-surface-temperature This upload includes two additional files:* SST - Sea Surface Temperature _NOAA Climate.gov.pdf is a screenshot of the main Climate.gov site for these snapshots (https://www.climate.gov/maps-data/data-snapshots/data-source/sst-sea-surface-temperature)* Cimate_gov_ Data Snapshots.pdf is a screenshot of the data download page for the full-resolution files.
ECMWF ERA5: ensemble means of surface level analysis parameter data
catalogue.ceda.ac.uk
data-search.nerc.ac.uk
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (ECMWF) (2025). ECMWF ERA5: ensemble means of surface level analysis parameter data [Dataset]. https://catalogue.ceda.ac.uk/uuid/d8021685264e43c7a0868396a5f582d0
Explore at:
Dataset updated
Jul 7, 2025
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
European Centre for Medium-Range Weather Forecasts (ECMWF)
License
https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
Area covered
Earth
Variables measured
cloud_area_fraction, sea_ice_area_fraction, air_pressure_at_mean_sea_level, lwe_thickness_of_atmosphere_mass_content_of_water_vapor
Description
This dataset contains ERA5 surface level analysis parameter data ensemble means (see linked dataset for spreads). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.

Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data.

The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects.

An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
Climate.gov Data Snapshots: Arctic Sea Ice Age
datalumos.org
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2025). Climate.gov Data Snapshots: Arctic Sea Ice Age [Dataset]. http://doi.org/10.3886/E233443V1
Explore at:
Unique identifier
https://doi.org/10.3886/E233443V1
Dataset updated
Jun 18, 2025
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Arctic Ocean
Description
Q: How has the age of Arctic Sea Ice changed over time? A: Since the late 1900s, Arctic sea ice has thinned, and less sea ice has persisted in the Arctic over multiple melt seasons. The trend toward younger, thinner sea ice over time reflects warming temperatures in the Arctic. As older ice is thicker than younger ice, the reduced area of old ice also indicates a reduction in the total volume of ice. Q: Where do these measurements come from? A: Scientists estimate the age of sea ice by combining satellite observations of ice locations and extent with buoy data on winds and motion. Q: What do the colors mean? A: Colors show the age of sea ice floating in the Arctic Ocean. The darkest blue areas on the map show seasonal or first-year ice, which formed during the most recent winter. White areas show where ice is more than four years old. Ice thickness is strongly correlated with ice age. First year ice ranges from 4 to 12 inches (10 to 30 centimeters) thick, while multiyear ice ranges from 6 to 12 feet (2 to 4 meters) thick. This correlation means that in general, the brighter the color, the thicker ice. Q: Why do these data matter? A: In the mid-to-late 1900s, a core of thick, old year-round sea ice covered much of the Arctic Ocean. Around that core, seasonal ice formed each winter and melted each summer. North of Alaska, a looping current called the Beaufort Gyre historically acted as a nursery for young sea ice where ice could persist and thicken. Ice growth in the gyre roughly offset the steady transport of ice out of the Arctic Ocean through the Fram Strait east of Greenland. Since the year 2000, warmer summers have caused ice to melt in the southern stretch of the Beaufort Gyre, so less multiyear ice has persisted. The result is younger, thinner sea ice than in decades past. Today, the amount of thick, old ice in the Arctic is a small fraction of what it was in the 1980s. Because young, thin ice melts more easily than old, thick ice, the trend toward thinner ice is self-reinforcing. Q: How did you produce these snapshots? A: Data Snapshots are derivatives of existing data products: to meet the needs of a broad audience, we present the source data in a simplified visual style. Additional information These Arctic Sea Ice Age maps use NSIDC Quicklook Arctic Weekly EASE-Grid Sea Ice Age, Version 1 data from 2020 to now, while maps from 2019 and earlier use NSIDC EASE-Grid Sea Ice Age, Version 4 data. Both datasets are available as PNGs (.png) and NetCDF (.nc) files. References Perovich, D., Meier, W., Tschudi, M., Farrell, S., Hendricks, S., Gerland, S., Kaleschke, L., Ricker, R., Tian-Kunze, X., Webster, M., Woods, K. (2019). Sea ice. 2019 Arctic Report Card. Source: https://www.climate.gov/maps-data/data-snapshots/data-source/arctic-sea-ice-age This upload includes two additional files:* Arctic Sea Ice Age _NOAA Climate.gov.pdf is a screenshot of the main Climate.gov site for these snapshots (https://www.climate.gov/maps-data/data-snapshots/data-source/arctic-sea-ice-age )* Cimate_gov_ Data Snapshots.pdf is a screenshot of the data download page for the full-resolution files.
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
ACRIM III Level 2 Daily Mean Data V001
data.nasa.gov
s.cnmilf.com
+4more
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). ACRIM III Level 2 Daily Mean Data V001 [Dataset]. https://data.nasa.gov/dataset/acrim-iii-level-2-daily-mean-data-v001-a9907
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
ACR3L2DM_1 is the Active Cavity Radiometer Irradiance Monitor (ACRIM) III Level 2 Daily Mean Data version 1 product consists of Level 2 total solar irradiance in the form of daily means gathered by the ACRIM III instrument on the ACRIMSAT satellite. The daily means are constructed from the shutter cycle results for each day.
f
Machine Learning-Based Retention Time Prediction Tool for Routine LC-MS Data...
acs.figshare.com
zip
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofiia A. Dymura; Oleksandr O. Viniichuk; Kostiantyn P. Melnykov; Dmytro S. Radchenko; Oleksandr O. Grygorenko (2025). Machine Learning-Based Retention Time Prediction Tool for Routine LC-MS Data Analysis [Dataset]. http://doi.org/10.1021/acs.jcim.5c00514.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.5c00514.s002
Dataset updated
Jul 16, 2025
Dataset provided by
ACS Publications
Authors
Sofiia A. Dymura; Oleksandr O. Viniichuk; Kostiantyn P. Melnykov; Dmytro S. Radchenko; Oleksandr O. Grygorenko
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Accurate retention time (RT) prediction models can significantly improve liquid chromatography–mass spectrometry (LC-MS) data analysis widely used in chemical synthesis. As hundreds of thousands of syntheses are performed annually at Enamine, a large amount of experimental data has been generated internally. In this paper, we present the development of an RT prediction model based on the GATv2Conv + DL graph neural network (NN) architecture, trained on the internal data and further evaluated using the METLIN SMRT data set. The final model achieved a mean absolute error (MAE) of 2.48 s for the 120 s LC-MS method. We also conducted a detailed analysis of RT prediction errors and determined that the interval between RT – 7.12 s and RT + 9.58 s contained over 95% of the data. The developed model has been successfully integrated into the existing in-house LC-MS analysis toolkit, enhancing its predictive and analytical capabilities. Additionally, we have published a curated subset of 20,000 data points from our internal data set to support community benchmarking and further research.
ERA5 post-processed daily statistics on single levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 post-processed daily statistics on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.4991cf48
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.4991cf48
Dataset updated
Jul 23, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
Time period covered
Jan 1, 1940 - Jul 17, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

*The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.
Monthly mean climate data from a transient simulation with the Whole...
catalogue.ceda.ac.uk
data-search.nerc.ac.uk
Updated Mar 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ingrid Cnossen (2024). Monthly mean climate data from a transient simulation with the Whole Atmosphere Community Climate Model eXtension (WACCM-X) from 2015 to 2070 [Dataset]. https://catalogue.ceda.ac.uk/uuid/45283390b97c4a27861d74b3d915b0bd
Explore at:
Dataset updated
Mar 9, 2024
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Ingrid Cnossen
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Time period covered
Jan 1, 2015 - Dec 31, 2070
Area covered
Earth
Variables measured
atmosphere_hybrid_sigma_pressure_coordinate
Description
This dataset comprises monthly mean data from a global, transient simulation with the Whole Atmosphere Community Climate Model eXtension (WACCM-X) from 2015 to 2070. WACCM-X is a global atmosphere model covering altitudes from the surface up to ~500 km, i.e., including the troposphere, stratosphere, mesosphere and thermosphere. WACCM-X version 2.0 (Liu et al., 2018) was used, part of the Community Earth System Model (CESM) release 2.1.0 (http://www.cesm.ucar.edu/models/cesm2) made available by the National Center for Atmospheric Research. The model was run in free-running mode with a horizontal resolution of 1.9 degrees latitude and 2.5 degrees longitude (giving 96 latitude points and 144 longitude points) and 126 vertical levels. Further description of the model and simulation setup is provided by Cnossen (2022) and references therein. A large number of variables is included on standard monthly mean output files on the model grid, while selected variables are also offered interpolated to a constant height grid or vertically integrated in height (details below). Zonal mean and global mean output files are included as well.

The data are provided in NetCDF format and file names have the following structure:

f.e210.FXHIST.f19_f19.h1a.cam.h0.[YYYY]-[MM][DFT].nc

where [YYYY] gives the year with 4 digits, [MM] gives the month (2 digits) and [DFT] specifies the data file type. The following data file types are included:

1) Monthly mean output on the full grid for the full set of variables; [DFT] = 2) Zonal mean monthly mean output for the full set of variables; [DFT] = _zm
3) Global mean monthly mean output for the full set of variables; [DFT] = _gm 4) Height-interpolated/-integrated output on the full grid for selected variables; [DFT] = _ht

A cos(latitude) weighting was used when calculating the global means.

Data were interpolated to a set of constant heights (61 levels in total) using the Z3GM variable (for variables output on midpoints, with 'lev' as the vertical coordinate) or the Z3GMI variable (for variables output on interfaces, with ilev as the vertical coordinate) stored on the original output files (type 1 above). Interpolation was done separately for each longitude, latitude and time.

Mass density (DEN [g/cm3]) was calculated from the M_dens, N2_vmr, O2, and O variables on the original data files before interpolation to constant height levels.

The Joule heating power QJ [W/m3] was calculated using Q_J = (sigma_P*B^2)*((u_i - U_n)^2 + (v_i-v_n)^2 + (w_i-w_n)^2) with sigma_P = Pedersen conductivity[S], B = geomagnetic field strength [T], ui, vi, and wi = zonal, meridional, and vertical ion velocities [m/s] and un, vn, and wn = neutral wind velocities [m/s]. QJ was integrated vertically in height (using a 2.5 km height grid spacing rather than the 61 levels on output file type 4) to give the JHH variable on the type 4 data files. The QJOULE variable also given is the Joule heating rate [K/s] at each of the 61 height levels.

All data are provided as monthly mean files with one time record per file, giving 672 files for each data file type for the period 2015-2070 (56 years).

References:

Cnossen, I. (2022), A realistic projection of climate change in the upper atmosphere into the 21st century, in preparation.

Liu, H.-L., C.G. Bardeen, B.T. Foster, et al. (2018), Development and validation of the Whole Atmosphere Community Climate Model with thermosphere and ionosphere extension (WACCM-X 2.0), Journal of Advances in Modeling Earth Systems, 10(2), 381-402, doi:10.1002/2017ms001232.
Climate.gov Data Snapshots: Temperature - Global Monthly, Difference from...
datalumos.org
Updated Jun 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration (2025). Climate.gov Data Snapshots: Temperature - Global Monthly, Difference from Average [Dataset]. http://doi.org/10.3886/E233461V1
Explore at:
Unique identifier
https://doi.org/10.3886/E233461V1
Dataset updated
Jun 18, 2025
Dataset authored and provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Global
Description
Q: Where was the monthly temperature warmer or cooler than usual? A: Colors show where average monthly temperature was above or below its 1991-2020 average. Blue areas experienced cooler-than-usual temperatures while areas shown in red were warmer than usual. The darker the color, the larger the difference from the long-term average temperature. Q: Where do these measurements come from? A: Weather stations on every continent record temperatures over land, and ocean surface temperatures come from measurements made by ships and buoys. NOAA scientists merge the readings from land and ocean into a single dataset. To calculate difference-from-average temperatures—also called temperature anomalies—scientists calculate the average monthly temperature across hundreds of small regions, and then subtract each region’s 1991-2020 average for the same month. If the result is a positive number, the region was warmer than the long-term average. A negative result from the subtraction means the region was cooler than usual. To generate the source images, visualizers apply a mathematical filter to the results to produce a map that has smooth color transitions and no gaps. Q: What do the colors mean? A: Shades of red show where average monthly temperature was warmer than the 1991-2020 average for the same month. Shades of blue show where the monthly average was cooler than the long-term average. The darker the color, the larger the difference from average temperature. White and very light areas were close to their long-term average temperature. Gray areas near the North and South Poles show where no data are available. Q: Why do these data matter? A: Over time, these data give us a planet-wide picture of how climate varies over months and years and changes over decades. Each month, some areas are cooler than the long-term average and some areas are warmer. Though we don’t see an increase in temperature at every location every month, the long-term trend shows a growing portion of Earth’s surface is warmer than it was during the base period. Q: How did you produce these snapshots? A: Data Snapshots are derivatives of existing data products: to meet the needs of a broad audience, we present the source data in a simplified visual style. NOAA's Environmental Visualization Laboratory (NNVL) produces the source images for the Difference from Average Temperature – Monthly maps. To produce our images, we run a set of scripts that access the source images, re-project them into desired projections at various sizes, and output them with a custom color bar. Additional information Source images available through NOAA's Environmental Visualization Lab (NNVL) are interpolated from data originally provided by the National Center for Environmental Information (NCEI) - Weather and Climate. NNVL images are based on NOAA Merged Land Ocean Global Surface Temperature Analysis data (NOAAGlobalTemp, formerly known as MLOST). References NCEI Monthly Global Analysis NOAA View Temperature Anomaly Merged Land Ocean Global Surface Temperature Analysis Global Surface Temperature Anomalies Climate at a Glance - Data Information Source: https://www.climate.gov/maps-data/data-snapshots/data-source/temperature-global-monthly-difference-a...This upload includes two additional files:* Temperature - Global Monthly, Difference from Average _NOAA Climate.gov.pdf is a screenshot of the main Climate.gov site for these snapshots (https://www.climate.gov/maps-data/data-snapshots/data-source/temperature-global-monthly-difference-a...)* Cimate_gov_ Data Snapshots.pdf is a screenshot of the data download page for the full-resolution files.
Crime Data from 2020 to Present
data.lacity.org
s.cnmilf.com
+1more
application/rdfxml +5
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Los Angeles Police Department (2025). Crime Data from 2020 to Present [Dataset]. https://data.lacity.org/Public-Safety/Crime-Data-from-2020-to-Present/2nrs-mtv8
Explore at:
json, tsv, application/rssxml, csv, application/rdfxml, xmlAvailable download formats
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Los Angeles Police Departmenthttp://lapdonline.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
***Starting on March 7th, 2024, the Los Angeles Police Department (LAPD) will adopt a new Records Management System for reporting crimes and arrests. This new system is being implemented to comply with the FBI's mandate to collect NIBRS-only data (NIBRS — FBI - https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/ucr/nibrs). During this transition, users will temporarily see only incidents reported in the retiring system. However, the LAPD is actively working on generating new NIBRS datasets to ensure a smoother and more efficient reporting system. ***

******Update 1/18/2024 - LAPD is facing issues with posting the Crime data, but we are taking immediate action to resolve the problem. We understand the importance of providing reliable and up-to-date information and are committed to delivering it.

As we work through the issues, we have temporarily reduced our updates from weekly to bi-weekly to ensure that we provide accurate information. Our team is actively working to identify and resolve these issues promptly.

We apologize for any inconvenience this may cause and appreciate your understanding. Rest assured, we are doing everything we can to fix the problem and get back to providing weekly updates as soon as possible. ******

This dataset reflects incidents of crime in the City of Los Angeles dating back to 2020. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. This data is as accurate as the data in the database. Please note questions or concerns in the comments.
National Energy Efficiency Data-Framework (NEED) report: summary of analysis...
gov.uk
s3.amazonaws.com
Updated Aug 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Business, Energy & Industrial Strategy (2023). National Energy Efficiency Data-Framework (NEED) report: summary of analysis 2021 [Dataset]. https://www.gov.uk/government/statistics/national-energy-efficiency-data-framework-need-report-summary-of-analysis-2021
Explore at:
Dataset updated
Aug 11, 2023
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Business, Energy & Industrial Strategy
Description
The National Energy Efficiency Data-Framework (NEED) was set up to provide a better understanding of energy use and energy efficiency in domestic and non-domestic buildings in Great Britain. The data framework matches data about a property together - including energy consumption and energy efficiency measures installed - at household level.

11 August 2023 Error notice: revisions to the June 2021 Domestic NEED annual report

We identified 2 processing errors in this edition of the Domestic NEED Annual report and corrected them. The changes are small and do not affect the overall findings of the report, only the domestic energy consumption estimates. The revisions are summarised here:

Error 1: Local authority consumption estimates

Extent of the error: Table LA13 and LA14 revised to correct for a processing error that was identified after accessible versions of the tables were published on 10 June 2022. The update did not include the ‘Unknown’ category which meant columns were misplaced. Table LA7 (2019 only) has also been updated to include data for counties, these rows previously appeared as [no data].

Years affected: 2017-2019

Countries affected: England and Wales

Data tables affected:

Local authority table, England and Wales, 2019 (Tables LA7, LA13 and LA14)

Local authority table, England and Wales, 2018 (Tables LA13 and LA14)

Local authority table, England and Wales, 2017 (Tables LA13 and LA14)

Error 2: Some properties incorrectly excluded from the Scotland multiple attributes tables

Extent of the error: These corrections primarily affect the number in sample column for all years as some properties were incorrectly excluded from the consumption estimates. There have also been revisions to the mean, median, upper and lower quartiles. Using 2019 as an example, around 80% of the updated mean and median values are within 300 kWh of what was previously published.

Years affected: 2017-2019

Countries affected: Scotland

Data tables affected: Multiple attributes tables: Scotland, 2019 (all tables)

4 August 2021 Error notice: revisions to the June 2021 Domestic NEED annual report

We identified 2 processing errors in this edition of the Domestic NEED Annual report and corrected them. The changes are small and do not affect the overall findings of the report, only the domestic energy consumption estimates. The impact of energy efficiency measures analysis remains unchanged. The revisions are summarised here:

Error 1: Some properties incorrectly excluded from the 2019 gas consumption estimates

Extent of the error: The properties that were incorrectly excluded made up around 1% of all properties that should have been included

Years affected: 2019

Countries affected: England and Wales, Scotland

Data table and documents affected:

the 2019 gas estimates in all Consumption tables

the 2019 gas estimates in the "https://www.gov.uk/government/statistical-data-sets/national-energy-efficiency-data-framework-need-data-explorer" class="govuk-link"><abbr title="National Energy Efficiency Data-Frame
H
Consumer Expenditure Survey (CE)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Consumer Expenditure Survey (CE) [Dataset]. http://doi.org/10.7910/DVN/UTNJAH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UTNJAH
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the consumer expenditure survey (ce) with r the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories. how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011? an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they're slowly releasing historical files back to 1996. hooray! for a t aste of what's possible with ce data, look at the quick tables listed on their main page - these tables contain approximately a bazillion different expenditure categories broken down by demographic groups. guess what? i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3). you can often get close to your statistic of interest from these web tables. but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old. another one of the thirteen web tables - the consumer unit composition table - shows a few different breakouts of households with kids, but none matching that exact population of interest. the bureau of labor statistics (bls) (the survey's designers) and the census bureau (the survey's administrators) have provided plenty of the major statistics and breakouts for you, but they're not psychic. if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states' population, then let a little r into your life. fun starts now. fair warning: only analyze t he consumer expenditure survey if you are nerd to the core. the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis. the scripts in this repository contain examples to prepare 'em all, just be advised that magnificent data like this will never be no-assembly-required. the folks at bls have posted an excellent summary of what's av ailable - read it before anything else. after that, read the getting started guide. don't skim. a few of the descriptions below refer to sas programs provided by the bureau of labor statistics. you'll find these in the C:\My Directory\CES\2011\docs directory after you run the download program. this new github repository contains three scripts: 2010-2011 - download all microdata.R lo op through every year and download every file hosted on the bls's ce ftp site import each of the comma-separated value files into r with read.csv depending on user-settings, save each table as an r data file (.rda) or stat a-readable file (.dta) 2011 fmly intrvw - analysis examples.R load the r data files (.rda) necessary to create the 'fmly' table shown in the ce macros program documentation.doc file construct that 'fmly' table, using five quarters of interviews (q1 2011 thru q1 2012) initiate a replicate-weighted survey design object perform some lovely li'l analysis examples replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using unimputed variables replicate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t -tests using unimputed variables create an rsqlite database (to minimize ram usage) containing the five imputed variable files, after identifying which variables were imputed based on pdf page 3 of the user's guide to income imputation initiate a replicate-weighted, database-backed, multiply-imputed survey design object perform a few additional analyses that highlight the modified syntax required for multiply-imputed survey designs replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using imputed variables repl icate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t-tests using imputed variables replicate the %proc_reg() and %proc_logistic() macros found in "ce macros.sas" and provide some examples of regressions and logistic regressions using both unimputed and imputed variables replicate integrated mean and se.R match each step in the bls-provided sas program "integr ated mean and se.sas" but with r instead of sas create an rsqlite database when the expenditure table gets too large for older computers to handle in ram export a table "2011 integrated mean and se.csv" that exactly matches the contents of the sas-produced "2011 integrated mean and se.lst" text file click here to view these three scripts for...

Facebook

Twitter

Click to copy link

Link copied

Cite

Dr Matthew Adams (2022). MATLAB code and output files for integral, mean and covariance of the simplex-truncated multivariate normal distribution [Dataset]. https://researchdatafinder.qut.edu.au/display/n20044

MATLAB code and output files for integral, mean and covariance of the simplex-truncated multivariate normal distribution

Explore at:

Dataset updated

Jul 25, 2022

Dataset provided by

Queensland University of Technology (QUT)

Authors

Dr Matthew Adams

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Compositional data, which is data consisting of fractions or probabilities, is common in many fields including ecology, economics, physical science and political science. If these data would otherwise be normally distributed, their spread can be conveniently represented by a multivariate normal distribution truncated to the non-negative space under a unit simplex. Here this distribution is called the simplex-truncated multivariate normal distribution. For calculations on truncated distributions, it is often useful to obtain rapid estimates of their integral, mean and covariance; these quantities characterising the truncated distribution will generally possess different values to the corresponding non-truncated distribution.

In the paper Adams, Matthew (2022) Integral, mean and covariance of the simplex-truncated multivariate normal distribution. PLoS One, 17(7), Article number: e0272014. https://eprints.qut.edu.au/233964/, three different approaches that can estimate the integral, mean and covariance of any simplex-truncated multivariate normal distribution are described and compared. These three approaches are (1) naive rejection sampling, (2) a method described by Gessner et al. that unifies subset simulation and the Holmes-Diaconis-Ross algorithm with an analytical version of elliptical slice sampling, and (3) a semi-analytical method that expresses the integral, mean and covariance in terms of integrals of hyperrectangularly-truncated multivariate normal distributions, the latter of which are readily computed in modern mathematical and statistical packages. Strong agreement is demonstrated between all three approaches, but the most computationally efficient approach depends strongly both on implementation details and the dimension of the simplex-truncated multivariate normal distribution.

This dataset consists of all code and results for the associated article.

Clear search

Close search

Google apps

Main menu

MATLAB code and output files for integral, mean and covariance of the...

ERA5 monthly averaged data on single levels from 1940 to present

What do we mean by "data" in the arts and humanities? Interview transcripts...

Simulation Data Set

Climate.gov Data Snapshots: Projections - Average Mean Temperature,...

Dataset for: A generalized partially linear mean-covariance regression model...

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Dataset for: Some Remarks on the R2 for Clustering

Climate.gov Data Snapshots: SST - Sea Surface Temperature

ECMWF ERA5: ensemble means of surface level analysis parameter data

Climate.gov Data Snapshots: Arctic Sea Ice Age

University SET data, with faculty and courses characteristics

ACRIM III Level 2 Daily Mean Data V001

Machine Learning-Based Retention Time Prediction Tool for Routine LC-MS Data...

ERA5 post-processed daily statistics on single levels from 1940 to present

Monthly mean climate data from a transient simulation with the Whole...

Climate.gov Data Snapshots: Temperature - Global Monthly, Difference from...

Crime Data from 2020 to Present

National Energy Efficiency Data-Framework (NEED) report: summary of analysis...

11 August 2023 Error notice: revisions to the June 2021 Domestic NEED annual report

Error 1: Local authority consumption estimates

4 August 2021 Error notice: revisions to the June 2021 Domestic NEED annual report

Error 1: Some properties incorrectly excluded from the 2019 gas consumption estimates

Consumer Expenditure Survey (CE)

MATLAB code and output files for integral, mean and covariance of the simplex-truncated multivariate normal distributionSee More Versions

MATLAB code and output files for integral, mean and covariance of the simplex-truncated multivariate normal distribution