Facebook
TwitterThis dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
## Example questions
Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
Answer: 4
Question: Calculate -841880142.544 + 411127.
Answer: -841469015.544
Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
Answer: 54*a - 30
It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:
Facebook
TwitterThere are several datasets for simple linear regression algorithm. But most of them are random datasets, though there is no problem with that, but I think that it is very important that the data you are working on no matter how small or simple that problem or algorithm is, should be meaningful. Hence, here is a bit sensible data of random numbers and their corresponding log base 10 values. You can use this dataset to practice and play around with Linear Regression Algorithm.
The dataset consists of two CSVs corresponding to the training and testing dataset. The training dataset was created in Google Spreadsheet using the RANDBETWEEN(1,1000) function to generate pseudo-random values. Then, LOG10() function was used to calculate the log base 10 value of each of these numbers. Afterward, these log values were truncated to 6 decimal points using TRUNC() formula. The testing dataset was created in the same way as the train dataset but the range of numbers were between 1001, and 2000.
I would like to thanks Dr. Andrew Ng for creating an amazing beginner-friendly ML course.
I hope this dataset helps Machine Learning beginners and newbies to practice and learn about Linear Regression.
Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterThe exercise after this contains questions that are based on the housing dataset.
How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173
How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161
How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92
What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000
For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.
What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features
If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.
If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above
If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
Earthquake data by USGS (6 Jun,23 - 6 Jul,23) listed worldwide earthquake data reported by seismic station From 2023-06-06 12:31:23.716000 to 2023-07-06 12:17:32.110000.
time : time when event is reported
depth : the depth in km below earth surface where earthquakes begins to rupture. Typical range : [0, 1000]
mag : the magnitude for the event. Typical range : [-1.0, 10.0]
magType : The method or algorithm used to calculate the preferred magnitude for the event. Possible values are “Md”, “Ml”, “Ms”, “Mw”, “Me”, “Mi”, “Mb”, “MLg”.
nst : The total number of seismic stations used to determine earthquake location.
gap : largest angle between two neighboring seismic monitoring stations around an earthquake. It measures how well the earthquake's horizontal position can be determined. Typical value range is [0.0, 180.0]
dmin : Horizontal distance from the epicenter to the nearest station (in degrees). 1 degree is approximately 111.2 kilometers. In general, the smaller this number, the more reliable is the calculated depth of the earthquake. Typical Values are in [0.4, 7.1].
rms : root-mean -square error. It measures the how model is better fil to predict arrival time of event for a location.
net : Identified network source for information of an event. Typical values are ak, at, ci, hv, ld, mb, nc, nm, nn, pr, pt, se, us, uu, uw.
id : A unique identifier for the event.
updated: Time when event is most recently updated.
type : Type of seismic event. Typical values are “earthquake”, “quarry”.
HorizontalError : Uncertainty of reported location of the event in kilometers. Typical values are in [0, 100].
depthError : Uncertainty of reported depth of the event in kilometers. Typical values are in [0, 100].
magError : Uncertainty of reported magnitude of an event. Typical values are in [0, 100].
magNst : The total number of seismic stations used to calculate the magnitude for this earthquake. (Not to be confused with nst.)
status : Indicates whether the event has been reviewed by a human. It may be automatically processed by machines. Typical values are reviewed, automatic, deleted.
locationSource : The network that originally authored the reported location of this event. Typical values are ak, at, ci, hv, ld, mb, nc, nm, nn, pr, pt, se, us, uu, uw.
magSource : Network that originally authored the reported magnitude for this event.
Noted From (at some places, copied from) : USGS
Facebook
TwitterPHREEQCI is a widely-used geochemical computer program that can be used to calculate chemical speciation and specific conductance of a natural water sample from its chemical composition (Charlton and Parkhurst, 2002; Parkhurst and Appelo, 1999). The specific conductance of a natural water calculated with PHREEQCI (Appelo, 2010) is reliable for pH greater than 4 and temperatures less than 35 °C (McCleskey and others, 2012b). An alternative method for calculating the specific conductance of natural waters is accurate over a large range of ionic strength (0.0004–0.7 mol/kg), pH (1–10), temperature (0–95 °C), and specific conductance (30–70,000 μS/cm) (McCleskey and others, 2012a). PHREEQCI input files for calculating the specific conductance of natural waters using the method described by McCleskey and others (2012a) have been created and are presented in this ScienceBase software release. The input files also incorporate three commonly used temperature compensation factors which can be used to determine the specific conductance at 25 °C: the constant (0.019), the non-linear (ISO-7888), and the temperature compensation factor described by McCleskey (2013) which is the most accurate for acidic waters (pH < 4). The specific conductance imbalance (SCI), which can be used along with charge balance as a quality-control check (McCleskey and others, 2012a), is also calculated: SCI (%) = 100 x (SC25 calculated – SC25 measured) / (SC25 measured) where SC25 calculated is the calculated specific conductance at 25 °C and SC25 measured is the measured specific conductance at 25 °C. Finally, the transport number (t), which is the relative contribution of a given ion to the overall electrical conductivity, for 30 ions is also calculated. Transport numbers are useful for interpreting specific conductance data and identify the ions that substantially contribute to the specific conductance. References Cited Appelo, C. A. J. 2017. Specific conductance: how to calculate, to use, and the pitfalls, [http://www.hydrochemistry.eu/exmpls/sc.html] Ball, J.W., and Nordstrom, D.K., 1991, User's manual for WATEQ4F, with revised thermodynamic data base and test cases for calculating speciation of major, trace, and redox elements in natural waters: U.S. Geological Survey Open-File Report 91-0183, p. 193. Charlton, S.R., and Parkhurst, D.L., 2002, PhreeqcI--A graphical user interface to the geochemical model PHREEQC: U.S. Geological Survey Fact Sheet FS-031-02, 2 p. McCleskey, R.B., Nordstrom, D.K., Ryan, J.N., and Ball, J.W., 2012a, A New Method of Calculating Electrical Conductivity With Applications to Natural Waters: Geochimica et Cosmochimica Acta, v. 77, p. 369-382. [http://www.sciencedirect.com/science/article/pii/S0016703711006181] McCleskey, R.B., Nordstrom, D.K., and Ryan, J.N. 2012b, Comparison of electrical conductivity calculation methods for natural waters. Limnology and Oceanography: Methods, v.10, p 952-967. [http://aslo.org/lomethods/free/2012/0952.html] McCleskey, R.B., 2013, New Method for Electrical Conductivity Temperature Compensation: Environmental Science & Technology, v. 47, p. 9874-9881. [http://dx.doi.org/10.1021/es402188r] Parkhurst, D.L., and Appelo, C.A.J., 1999, User's guide to PHREEQC (Version 2)--a computer program for speciation, batch-reaction, one-dimensional transport, and inverse geochemical calculations: U.S. Geological Survey Water- Resources Investigations Report 99-4259, 312 p.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In a previous version of this archive, geometry data and tables of opacity calculations were given that could be used to calculate the radiative pressure and absorption on fractal dust grains under Asymptotic Giant Branch (AGB) conditions (with a peak stellar wavelength of ~ 1 micron) for aggregates containing up to 256 primary particles. Because the focus of that work was on radiative pressure from a stellar spectrum peaking at approximately 1 micron, these data only covered the wavelength range from 0.3 to 30 microns. In this updated archive the wavelength range of the data has been expanded to allow calculation of the emission of the grains at longer wavelengths. Data are calculated for three common dust materials: forsterite, (Mg2SiO4), olivine, (Mg_(2x)Fe_(2(1-x))SiO4) with x=0.5, and 'astronomical silicate' (B.T. Draine and H.M. Lee, Optical Properties of Interstellar Graphite and Silicate Grains, Astrophysical Journal, 1984). In this updated version the range of aggregate sizes (number of primary particles in the aggregate) of some of these materials has also been increased from a maximum of 256 to 1024 constituent particles.
Example fractal aggregates were generated using the Diffusion Limited Aggregation (DLA) code as described in Wozniak M., Onofri F.R.A., Barbosa S., Yon J., Mroczka J., Comparison of methods to derive morphological parameters of multi-fractal samples of particle aggregates from TEM images, Journal of Aerosol Science 47: 12–26 (2012) and Onofri F.R.A., M. Wozniak, S. Barbosa, On the Optical Characterization of Nanoparticle and their Aggregates in Plasma Systems, Contributions to Plasma Physics 51(2-3):228-236 (2011). Aggregates were generated with a constant prefactor, kf=1.3, and two fractal dimensions (Df), representing open, porous (Df=1.8) aggregates and more compact (Df=2.8) aggregates.
The geometry files were produced with the DLA software. An example run using this software is shown for aggregates with 256 primary particles and a fractal dimension of 2.8 in the file 'dla_example.png'
For the fractal dimension=1.8 data, the number of primary particles in the aggregate, N, was increased up to 1024 from the previous maximum of 256 for all three dust materials investigated. In addition, the data for MgFeSiO4 with a fractal dimension of 2.8 was increased from 256 to 1024. As in the previous archive, 12 instances of each aggregate size were generated with primary particles having a radius of 0.5. These geometry data are given in:
aggregates_kf1.3_df1.8.zip --> Geometry for a prefactor of 1.3 and fractal dimension 1.8
aggregates_kf1.3_df2.8.zip --> Geometry for a prefactor of 1.3 and fractal dimension 2.8
An example file name for an aggregate is 'N_00000032_Agg_00000008.dat' where the first number is the number of primary particles in the aggregate (N=32) and the second number is the instance number (e.g. 8 of 12). The radius of each primary particle in an aggregate is 0.5. The geometry files have 4 columns: the x, y and z coordinates of each primary particle followed by the primary particle radius. In each zip file there is also a pdf document that describes the geometry data and shows an image of each geometry file.
These geometry data were then used to calculate the opacity of the aggregates using the Multiple Sphere T-Matrix code (MSTM v 3.0) developed by Daniel Mackowski (D.W. Mackowski, M.I. Mishchenko, A multiple sphere T-matrix Fortran code for use on parallel computer clusters, Journal of Quantitative Spectroscopy and Radiative Transfer, Volume 112, Issue 13, 2011). Data were generated using the first 10 instances of each aggregate size, and the geometry data were appropriately scaled to calculate the opacity data for primary particle radii ranging from 0.001 - 1.0 microns. As noted earlier, an earlier version of this archive was focused on radiative pressure on these aggregates and only covered the spectrum of a typical AGB star (0.3 to 30 microns wavelength). In this updated version this wavelength range has been increased to the longer wavelength limits of the optical data. By default, MSTM calculations are made along the z-axis of the geometry data. Additional calculations were made along the x and y axes for each aggregate. Therefore the final data set is the average of 30 values (10 instances each in the x,y,z directions).
The opacity data files are given in:
astronomical_silicate_df1.8.zip --> astronomical silicate aggregates with fractal dimension 1.8
astronomical_silicate_df2.8.zip --> astronomical silicate aggregates with fractal dimension 2.8
forsterite_df1.8.zip --> forsterite aggregates with fractal dimension 1.8
forsterite_df2.8.zip --> forsterite aggregates with fractal dimension 2.8
olivine_df1.8.zip --> olivine aggregates with fractal dimension 1.8
olivine_df2.8.zip --> olivine aggregates with fractal dimension 2.8
In the previous version of this archive, only the table files with the averages of the 10 instances were provided. In this updated version each of the individual opacity files used to create these tables is now also provided. These opacity files are numbered similar to the geometry files. For example, the opacity calculations for N=32, instance=5, angle=3 is given by
'opacity_results_N000032_I05_A03_file.dat.' Each file begins with a short header describing the data. For example, the astronomical silicate header for this N=32, instance=5, angle=3 file is:
#############################################################################################
# Number of primary particles in aggregate: 32
# Geometry Instance Number: 5
# Geometry File Name: N_00000032_Agg_00000005.dat
# Rotation Angles: 90.000 90.000 0.000
# Number of radius values: 30
# Minimum and maximum radius values in microns: 1.00000e-003 1.00000e+000
# Number of wavelength values: 92
# Minimum and maximum wavelength values in microns: 3.00000e-001 1.00000e+004
#############################################################################################
Afterwards the columns list the line number, the primary particle radius (microns), the wavelength (microns), the extinction efficiency factor, the absorption efficiency factor, the scattering absorption efficiency factor, the asymmetry factor and the radiation pressure efficiency factor. These efficiency factors are based on the effective radius of the aggregate described later in this document.
Within each of these zipped folders is a file that contains the averages of these individual opacity files. For example 'astronomical_silicate_df1.8.dat' is the averaged data for the astronomical silicate aggregates with a fractal dimension 1.8. As in the previous archive, the first lines of these table files are a header starting with the '#' character describing the table and the source of the optical data used.
After the header, the first line of data in the table has the following nine values giving the range for the data table and number of samples in N, (aggregate size), primary particle radius (microns) and wavelength (microns). These are:
Minimum aggregate size
Maximum aggregate size
Number of Aggregate samples
Primary Particle Minimum Radius (microns)
Primary Particle Maximum Radius (microns)
Number of Primary Particle radii samples
Wavelength minimum (microns)
Wavelength maximum (microns)
Number of Wavelength samples
Subsequent lines contain 13 columns. These columns give the efficiency factors and asymmetry factor for aggregates. These efficiency factors are based on the effective radius of the aggregate given by:
a_eff = a_primary*N^(1/3)
where a_primary is the primary particle radius and N is the number of primary particles in the aggregate.
For example, the absorption opacity of an aggregate would then be = pi*a_eff^2 * Q_abs.
The values in each column are:
Column 1: Primary particle radius in microns
Column 2: Wavelength in microns
Column 3: Number of primary particles in aggregate
Column 4: Mean Q_ext, mean extinction efficiency factor
Column 5: Standard Deviation of Mean Q_ext
Column 6: Mean Q_abs, mean absorption efficiency factor
Column 7: Standard Deviation of Mean Q_abs
Column 8: Mean Q_sca, mean scattering efficiency factor
Column 9: Standard Deviation of mean Q_sca
Column 10: Mean g_cos, mean asymmetry factor
Column 11: Standard Deviation of mean asymmetry factor
Column 12: Mean Q_pr, mean radiation pressure efficiency factor
Column 13: Standard Deviation of mean
Facebook
TwitterPlease note, this dataset has been superseded by a newer version (see below). Users should not use this version except in rare cases (e.g., when reproducing previous studies that used this version). USCRN "Processed" Data (labeled as "uscrn-processed"): are interpreted values and derived geophysical parameters with other quality indicators processed from raw data (both Datalogger files and/or Raw Data from GOES and NOAAPort) by the USCRN Team. Climate variable types include air temperature, precipitation, soil moisture, soil temperature, surface temperature, wetness, global solar radiation, relative humidity, and wind at 1.5 m above the ground. Many additional engineering variables are also available. These data have been decoded, quality-flagged, and processed into level 1 hourly data (the only applied quality control is rounding some values as they enter the database), and includes additional calculated values such as precipitation (5-minute and hourly), hourly maximum temperature, hourly minimum temperature, average temperature (5-minute and hourly), soil moisture (volumetric water content, 5-minute values at the 5 cm depth and and hourly values at all depths) for all dielectric values in range, layer average soil moisture (5 minute and hourly), and layer average soil temperature (5 minute and hourly). It is the general practice of USCRN to not calculate derived variables if the input data to these calculations are flagged. These data records are versioned based on the processing methods and algorithms used for the derivations (versions are noted within the data netCDF file), and data are updated when the higher quality raw data become available from stations' datalogger storage (Datalogger Files).
Facebook
TwitterIndividual percentages, median fluorescent intensities and concentrations for each horse that were used to generate figure graphs are compiled in labeled data tables. (A) Percentage of IgE+ monocytes out of total cells in unsorted, MACS sorted and MACS+FACS sorted samples from 18 different horses in Fig 2D. (B) Percentage of CD23- cells out of total IgE+ monocytes in Fig 3D. (C) Clinical scores of allergic in in Fig 4A. (D) Percentage of IgE+ monocytes out of total monocytes in Fig 4C. (E) Percentage of CD16+ cells out of total IgE+ monocytes in Fig 4D. (F) Serum total IgE (ng/ml) measured by bead-based assay in Fig 5A. (G) IgE median fluorescent intensity (MFI) of IgE mAb 176 (Alexa Fluor 488) on IgE+ monocytes in Fig 5B. (H) Combined serum total IgE and IgE MFI on IgE+ monocytes in Fig 5C. (I) Percentage of monocytes out of total IgE+ cells in Fig 6A. (J) Secreted concentration of IL-10 (pg/ml), IL-4 (pg/ml), IFN? (MFI) and IL-17A (MFI) as measured by bead-based assay in Fig 6B. (K) Percentage of CD16+ cells out of total IgE- CD14+ monocytes. B-H,K show allergic (n = 7) and nonallergic (n = 7) horses, J shows allergic (n = 8) and nonallergic (n = 8) horses in October 2019. C-H,K show data points collected from April 2018-March 2019. (XLSX)
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.13°C.]What does the data show? This dataset shows the change in annual temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Note, as the values in this dataset are averaged over a year they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare annual average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.
PeriodDescription 1981-2000 baselineAverage temperature (°C) for the period 2001-2020 (recent past)Average temperature (°C) for the period 2001-2020 (recent past) changeTemperature change (°C) relative to 1981-2000 1.5°C global warming level changeTemperature change (°C) relative to 1981-2000 2°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-2000 3°C global warming level changeTemperature change (°C) relative to 1981-2000 4°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Annual Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for the 1981-2000 baseline, 2001-2020 period and each warming level. They are named 'tas annual change' (change in air 'temperature at surface'), the warming level or historic time period, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas annual change 2.0 median' is the median value for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas annual change 2.0 median' is named 'tas_annual_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas annual change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Policies requiring biodiversity no net loss or net gain as an outcome of environmental planning have become more prominent worldwide, catalysing interest in biodiversity offsetting as a mechanism to compensate for development impacts on nature. Offsets rely on credible and evidence-based methods to quantify biodiversity losses and gains. Following the introduction of the United Kingdom’s Environment Act in November 2021, all new developments requiring planning permission in England are expected to demonstrate a 10% biodiversity net gain from 2024, calculated using the statutory biodiversity metric framework (Defra, 2023). The metric is used to calculate both baseline and proposed post-development biodiversity units, and is set to play an increasingly prominent role in nature conservation nationwide. The metric has so far received limited scientific scrutiny. This dataset comprises a database of statutory biodiversity metric unit values for terrestrial habitat samples across England. For each habitat sample, we present biodiversity units alongside five long-established single-attribute proxies for biodiversity (species richness, individual abundance, number of threatened species, mean species range or population, mean species range or population change). Data were compiled for species from three taxa (vascular plants, butterflies, birds), from sites across England. The dataset includes 24 sites within grassland, wetland, woodland and forest, sparsely vegetated land, cropland, heathland and shrub, i.e. all terrestrial broad habitats except urban and individual trees. Species data were reused from long-term ecological change monitoring datasets (mostly in the public domain), whilst biodiversity units were calculated following field visits. Fieldwork was carried out in April-October 2022 to calculate biodiversity units for the samples. Sites were initially assessed using metric version 3.1, which was current at the time of survey, and were subsequently updated to the statutory metric for analysis using field notes and species data. Species data were derived from 24 long-term ecological change monitoring sites across the Environmental Change Network (ECN), Long Term Monitoring Network (LTMN) and Ecological Continuity Trust (ECT), collected between 2010 and 2020. Methods Study sites We studied 24 sites across the Environmental Change Network (ECN), Long Term Monitoring Network (LTMN) and Ecological Continuity Trust (ECT). Biodiversity units were calculated following field visits by the authors, whilst species data (response variables) were derived from long-term ecological change monitoring datasets collected by the sites and mostly held in the public domain (Table S1). We used all seven ECN sites in England. We selected a complementary 13 LTMN sites to give good geographic and habitat representation across England. We included four datasets from sites supported by the ECT where 2 x 2m vascular plant quadrat data were available for reuse. The 24 sites included samples from all terrestrial broad habitats (sensu Defra 2023) in England, except urban and individual trees: grassland (8), wetland (6), woodland and forest (5), sparsely vegetated land (2), cropland (2), heathland and shrub (1). Non-terrestrial broad habitats (rivers and lakes, marine inlets and transitional waters) were excluded. Our samples ranged in biodiversity unit scores from 2 to 24, the full range of the metric. Not all 24 sites had long-term datasets from all taxa: 23 had vascular plant data, 8 had bird data, and 13 had butterfly data. We chose these three taxa as they are the most comprehensively surveyed taxa in England’s long-term biological datasets. Together they represent a taxonomically broad, although by no means representative, sample of English nature. Biodiversity unit calculation Baseline biodiversity units were attributed to each vegetation quadrat using the statutory biodiversity metric (Defra, 2023) (Equation 1). Sites were visited by the authors between April and October 2022, i.e. within the optimal survey period indicated in the metric guidance. Sites were assessed initially using metric version 3.1 (Panks et al., 2022), which was current at the time of survey, and were subsequently updated to the statutory metric for analysis using field notes and species data.. Following the biodiversity metric guidance, we calculated biodiversity units at the habitat parcel scale, such that polygons with consistent habitat type and condition are the unit of assessment. We assigned habitat type and condition score to all quadrats falling within the parcel. Where the current site conditions (2022) and quadrat data (2010 to 2020) differed from each other in habitat or condition, e.g. the % bracken cover, we deferred to the quadrat data in order to match our response and explanatory variables more fairly. Across all samples, area was set to 1 ha arbitrarily, and strategic significance set to 1 (no strategic significance), to allow comparison between sites. To assign biodiversity units to the bird and butterfly transects, we averaged the biodiversity units of plant quadrats within the transect routes plus a buffer of 500 m (birds) or 100 m (butterflies). Quadrats were positioned to represent the habitats present at each site proportionally, and transect routes were also positioned to represent the habitats present across each site. Although units have been calculated as precisely as possible for all taxa, we recognize that biodiversity units are calculated more precisely for the plant dataset than the bird and butterfly dataset: the size of transect buffer is subjective, and some transects run adjacent to offsite habitat that could not be accessed. Further detail about biodiversity unit calculation can be found in the Supporting Information. Equation 1. Biodiversity unit calculation following the statutory biodiversity metric (Defra, 2023) Size of habitat parcel × Distinctiveness × Condition × Strategic Significance = Biodiversity Units Species response variable calculation We reused species datasets for plants, birds and butterflies recorded by the sites to calculate our response variables (Table S1). Plant species presence data were recorded using 2 x 2m quadrats of all vascular plant species at approximately 50 sample locations per site (mean 48.1, sd 3.7), stratified to represent all habitat types on site. If the quadrat fell within woodland or scrub, trees and shrubs rooted within a 10 x 10 m plot centred on the quadrat were also counted and added to the quadrat species records, with any duplicate species records removed. We treated each quadrat as a sample point, and the most recent census year was analysed (ranging between 2011-2021). Bird data were collected annually using the Breeding Birds Survey method of the British Trust for Ornithology: two approximately parallel 1 km long transects were routed through representative habitat on each site. The five most recent census years were analysed (all fell between 2006-2019), treating each year as a sample point (Bateman et al., 2013). Butterfly data were collected annually using the Pollard Walk method of the UK Butterfly Monitoring Scheme: a fixed transect route taking 30 to 90 minutes to walk (c. 1-2 km) was established through representative habitat on each site. The five most recent census years were analysed (all fell between 2006-2019), treating each year as a sample point. Full detail of how these datasets were originally collected in the field can be found in Supporting Information. For species richness estimates we omitted any records with vague taxon names not resolved to species level. Subspecies records were put back to the species level, as infraspecific taxa were recorded inconsistently across sites. Species synonyms were standardised across all sites prior to analysis. For bird abundance we used the maximum count of individuals recorded per site per year for each species as per the standard approach (Bateman et al., 2013). For butterfly abundance we used sum abundance over 26 weekly visits each year for each species at each site, using a GAM to interpolate missing weekly values (Dennis et al., 2013). Designated taxa were identified using the Great Britain Red List data held by JNCC (2022); species with any Red List designation other than Data Deficient or Least Concern were summed. Plant species range and range change index data followed PLANTATT (Hill et al., 2004). Range was measured as the number of 10x10 km cells across Great Britain that a species is found in. The change index measures the relative magnitude of range size change in standardised residuals, comparing 1930-1960 with 1987-1999. For birds, species mean population size across Great Britain followed Musgrove et al., 2013. We used the breeding season population size estimates to match field surveys. Bird long-term population percentage change (generally 1970-2014) followed Defra (2017). For butterflies, range and change data followed Fox et al., 2015. Range data was occupancy of UK 10 km squares 2010-2014. Change was percent abundance change 1976-2014. For all taxa, mean range and mean change were averaged from all the species present in the sample, not weighted by the species’ abundance in the sample. · Bateman, I. J., Harwood, A. R., Mace, G. M., Watson, R. T., Abson, D. J., Andrews, B., et al. (2013). Bringing ecosystem services into economic decision-making: Land use in the United Kingdom. Science (80-. ). 341, 45–50. doi: 10.1126/science.1234379. · British Trust for Ornithology (BTO), 2022. Breeding Bird methodology and survey design. Available online at https://www.bto.org/our-science/projects/breeding-bird-survey/research-conservation/methodology-and-survey-design · Defra, 2023. Statutory biodiversity metric tools and guides. https://www.gov.uk/government/publications/statutory-biodiversity-metric-tools-and-guides. · Dennis, E. B., Freeman, S. N., Brereton, T., and
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Despite extensive research into the behavioral ecology of free-ranging animal groups, questions remain about how group members integrate information about their physical and social surroundings. This is because a) tracking of multiple group members is limited to a few easily manageable species; and b) the tools to simultaneously quantify physical and social influences on an individual’s movement remain challenging, especially across large geographic scales. A relevant example of a widely-ranging species with a complex social structure and of conservation concern is the African savanna elephant. We evaluate highly synchronized GPS tracks from five male elephants in Etosha National Park in Namibia by incorporating their dynamic social landscape into an established resource selection model. The fitted model predicts movement patterns based simultaneously on the physical landscape (e.g., repeated visitation of waterholes) and the social landscape (e.g., avoidance of a dominant male). Combining the fitted models for multiple focal individuals produces landscape-dependent social networks that vary over space (e.g., with distance from a waterhole) and time (e.g., as the seasons change). The networks, especially around waterholes, are consistent with dominance patterns determined from previous behavioral studies. Models that combine physical landscape and social effects based on remote tracking can augment traditional methods for determining social structure from intensive behavioral observations. More broadly, these models will be essential to effective, in-situ conservation and management of wide-ranging social species in the face of anthropogenic disruptions to their physical surroundings and social connections. Methods Study subjects and the social landscape: The five individuals considered in this study belong to a large elephant subpopulation residing in the northeastern region of Etosha National Park, Namibia. As a part of a different research effort, these individuals were classified into several age, dominance, social, and reproductive categories (O’Connell-Rodwell et al. 2011; O’Connell et al. 2024a). The age structure in this population was determined on the basis of several morphological features and can be found in the original publication (O’Connell-Rodwell et al. 2011; O’Connell et al. 2022). The dominance categories are reported from a population-level, ordinal dominance hierarchy based on the frequency of agonistic dyadic interactions (i.e., displacement) observed during all-occurrence sampling, over multiple field seasons (O’Connell-Rodwell et al. 2024a). The social categories were approximated using social network analysis (i.e., eigenvector centrality—an index expressing how influential an individual is based on the frequency of associating with other influential conspecifics) (O’Connell-Rodwell et al. 2024a; O’Connell-Rodwell et al. 2024b;). The reproductive category expresses whether an elephant was in musth at the time of behavioral data collection. Tracking data: In September 2009, ENP personnel fitted five elephants with Global Positioning System (GSP) and satellite Global System for Mobile Communication (GSM) tracking devices. The trackers recorded positional data (i.e., longitude, latitude) every 15 minutes over approximately 24 months. Prior to analysis, we converted tracking data to Cartesian units (i.e., meters) using the Universal Transverse Mercator coordinate system (UTM) projection. We also filtered the data to remove outlier movements as follows: we kept only movements (pairs of GPS fixes) in which 1) the interval was 15 minutes, 2) the focal individual moved ≤ 300 m in that time, and 3) all four of the other tracked elephants were within 20 km. Criterion 1 eliminates missed fixes; criterion 2 eliminates a small number of unusually fast movements which could represent startle responses to rare stimuli; and criterion 3 ensures that there is at least the potential for social interactions between all five elephants. The resulting datasets (one for each focal individual) had between 27,397 and 30,584 movements. The physical landscape: To evaluate tracking data in the context of the physical landscape, we constructed a map of vegetation productivity using data from the 16-day 250 m Normalized Difference Vegetation Index (NDVI) MODIS imagery. We also created a map of the perennial waterholes by extracting coordinate information from existing geospatial records generated by ENP personnel. Finally, we compiled a map of ‘frequently visited areas’ (FVAs) as the centroids of the top 20 clusters of large turning angles (>90 degrees) in the movement data. These locations broadly correlate with the presence of shade and proximity to fruiting trees (Kilian, W. personal communication), which in other populations affect elephant movement. The Social Resource Selection Function (SRSF) model: Our approach extends the existing Resource Selection Function framework in which an individual’s location, when fixed (by a GPS device or other tracking methodology) is considered a choice made from a set of possible locations. This set of locations is bounded in space by how far from its previously known location the individual could reasonably be expected to move in the time between the two fixes. The relative probability of ending up at different destinations, relative to one’s current location, is modeled using conditional logistic regression (CLR) as a function of various environmental parameters that differ between locations (e.g., ‘vegetation density’, ‘distance to water’, distance to the previous location). The SRSF model adds to the RSF framework by considering the locations of other individuals in a moving group as time-varying point features of the landscape. One individual (the focal individual) is modeled, and the locations of the others (nonfocal individuals) are incorporated as ‘distance to neighbor’ values that can be calculated for all the possible locations in the CLR. Assuming that each elephant responds differently to different conspecifics, we calculate a set of social predictors by determining the distance to each neighbor separately. For any given movement m, the ‘choice’ is a binary response, where a potential location i is either the endpoint at which the individual was recorded (yi = 1) or one of the alternatives (yi = 0). For convenience, we have labeled the chosen location with the subscript j (j ∈ i). The probability of a movement is modeled by where X is a matrix of k predictors derived from the landscape data; β is a k by 1 matrix of parameters to be estimated; c is the total number of locations considered (1 being the actual endpoint and c – 1 being randomly sampled within a circle of fixed radius); and s is the probability of a stochastic, ‘non-choice-type movement’ for which the endpoint is independent to any of the included predictors. One example might be a sudden scare that causes a flight response. In this case, we assign all possible endpoints the same probability 1/c. Including the possibility of non-choice movements is a novel addition to the standard CLR model; we found that for these data it stabilized the parameter estimates (meaning that we obtained similar results with different random subsets of the data when it was included, and disparate estimates when it was not). Overall, pm is the predicted ‘preference value’ for the chosen location divided by (and therefore conditional on) the sum of the preference values for the random sample of possible locations. In practice, depending on the resolution of the landscape and the boundary of possible distances reached, the denominator could include hundreds or even thousands of random locations. This can make computation of the expression, which is repeated for every movement in a dataset, time-consuming—a challenge that then translates into the model fitting. It is thus standard practice to randomly select a fixed number of non-chosen alternative locations on the assumption that they will comprise a representative sample of the landscape variation available to the individual. Given that our landscape features — various distance measures plus an interpolated array of NDVI values — vary smoothly and continuously within our sampling radius, we used 30 random locations (so c = 31). We fit the CLR by maximizing L, the log-likelihood of the entire set of n movements, using quasi-Newton nonlinear maximization. We performed variable selection by first fitting models with all possible subsets of physical and social landscape variables in their quadratic forms, except for distance to the previous location, which was always included as a linear function as an established proxy for the effort required to move to a new location. We ranked the models using Akaike’s Information Criterion (AIC) and calculated importance scores for each variable as the cumulative Akaike weight of the models in which it appeared. Interpretation of the SRSF model outputs depends on the functional form of each variable over the range of its values and its importance score. Because a linear cost-of-movement function is in every model by design, we exclude it from further reporting and discussion. The functional forms of the remaining variables can be divided into five categories: 1) monotonically increasing or 2) decreasing (indicating a preference for large or smaller values of the variable in question); 3) convex with the maximum within the data range (a preference for intermediate values); 4) concave with the minimum within the data range (a preference for large and small values indicating a back-and-forth movement between the locations containing the variable in question); or 5) constant over the data range (lack of preference for a specific value) (Mashintonio et al. 2014). The SRSF model outputs are expressed as the relative preference for movement towards locations defined by the
Facebook
TwitterUSCRN Processed data are interpreted values and derived geophysical parameters processed from raw data by the USCRN Team. Data were interpreted and ingested into a database, but are also available as netCDF files exported from the database. The "Version 2" of the dataset denotes the application of a new precipitation algorithm to calculate values. (See documentation for more information). Climate variable types include air temperature, precipitation, soil moisture, soil temperature, surface temperature, wetness, global solar radiation, relative humidity, and wind at 1.5 m above the ground. Many additional engineering variables are also available. These data have been decoded from the raw data streams, quality-flagged, and processed into level 1 hourly data (the only applied quality control is rounding some values as they enter the database), and includes additional calculated values such as precipitation (5-minute and hourly), hourly max/min temperature, average temperature (5-minute and hourly), soil moisture (volumetric water content, 5-minute values at the 5 cm depth and and hourly values at all depths) for all dielectric values in range, layer average soil moisture (5 minute and hourly), and layer average soil temperature (5 minute and hourly). It is the general practice of USCRN to not calculate derived variables if the input data to these calculations are flagged.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In this work, a generalized quantitative structure–property relationship (QSPR) model is developed for predicting kp by using norm index (NI)-based descriptors, which is the so-called kp (T, NI)-QSPR model. The as-developed model enables the use of one unified formula to calculate kp values for a wide range of monomers, including linear and branched (meth)acrylates, nitrogen-containing methacrylates, hydroxyl-containing (meth)acrylates, and so forth. Importantly, the model exhibits excellent performance when compared with the benchmark kp values from the literature, and model validation proves the reasonable goodness-of-fit, robustness, predictivity, and reliability of the as-developed model. Meanwhile, the Arrhenius parameters show a clear kinetic behavior, indicating that acrylates have smaller fit, robustness, predictivity, and reliability of the as-developed model. Meanwhile, the Arrhenius parameters show a clear kinetic behavior, indicating that acrylates have smaller Ea values than methacrylates, which render higher kp values and activities in free-radical polymerization for acrylates. Notably, the model allows the prediction of kp values of monomer mixtures and new monomers. In view of the satisfactory accuracy in determining kp values, it is expected that our proposed method will contribute to the determination of kinetic parameters beyond propagation kinetics for a wide monomer range, and the obtained Arrhenius parameters can further improve the fundamental understanding of radical polymerization kinetics.
Facebook
TwitterThis is the Baltic and North Sea Climatology (BNSC) for the Baltic Sea and the North Sea in the range 47 ° N to 66 ° N and 15 ° W to 30 ° E. It is the follow-up project to the KNSC climatology. The climatology was first made available to the public in March 2018 by ICDC and is published here in a slightly revised version 2. It contains the monthly averages of mean air pressure at sea level, and air temperature, and dew point temperature at 2 meter height. It is available on a 1 ° x 1 ° grid for the period from 1950 to 2015. For the calculation of the mean values, all available quality-controlled data of the DWD (German Meteorological Service) of ship observations and buoy measurements were taken into account during this period. Additional dew point values were calculated from relative humidity and air temperature if available. Climatologies were calculated for the WMO standard periods 1951-1980, 1961-1990, 1971-2000 and 1981-2010 (monthly mean values). As a prerequisite for the calculation of the 30-year-climatology, at least 25 out of 30 (five-sixths) valid monthly means had to be present in the respective grid box. For the long-term climatology from 1950 to 2015, at least four-fifths valid monthly means had to be available. Two methods were used (in combination) to calculate the monthly averages, to account for the small number of measurements per grid box and their uneven spatial and temporal distribution: 1. For parameters with a detectable annual cycle in the data (air temperature, dew point temperature), a 2nd order polynomial was fitted to the data to reduce the variation within a month and reduce the uncertainty of the calculated averages. In addition, for the mean value of air temperature, the daily temperature cycle was removed from the data. In the case of air pressure, which has no annual cycle, in version 2 per month and grid box no data gaps longer than 14 days were allowed for the calculation of a monthly mean and standard deviation. This method differs from KNSC and BNSC version 1, where mean and standard deviation were calculated from 6-day windows means. 2. If the number of observations fell below a certain threshold, which was 20 observations per grid box and month for the air temperature as well as for the dew point temperature, and 500 per box and month for the air pressure, data from the adjacent boxes was used for the calculation. The neighbouring boxes were used in two steps (the nearest 8 boxes, and if the number was still below the threshold, the next sourrounding 16 boxes) to calculate the mean value of the center box. Thus, the spatial resolution of the parameters is reduced at certain points and, instead of 1 ° x 1 °, if neighboring values are taken into account, data from an area of 5 ° x 5 ° can also be considered, which are then averaged into a grid box value. This was especially used for air pressure, where the 24 values of the neighboring boxes were included in the averaging for most grid boxes. The mean value, the number of measurements, the standard deviation and the number of grid boxes used to calculate the mean values are available as parameters in the products. The calculated monthly and annual means were allocated to the centers of the grid boxes: Latitudes: 47.5, 48.5, ... Longitudes: -14.5, -13.5, … In order to remove any existing values over land, a land-sea mask was used, which is also provided in 1 ° x 1 ° resolution. In this version 2 of the BNSC, a slightly different database was used, than for the KNSC, which resulted in small changes (less than 1 K) in the means and standard deviations of the 2-meter air temperature and dew point temperature. The changes in mean sea level pressure values and the associated standard deviations are in the range of a few hPa, compared to the KNSC. The parameter names and units have been adjusted to meet the CF 1.6 standard.
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.09°C.]What does the data show? This dataset shows the change in summer average temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Here, summer is defined as June-July-August. Note, as the values in this dataset are averaged over a season they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged over the summer period to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare summer average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.PeriodDescription1981-2000 baselineAverage temperature (°C) for the period2001-2020 (recent past)Average temperature (°C) for the period2001-2020 (recent past) changeTemperature change (°C) relative to 1981-20001.5°C global warming level changeTemperature change (°C) relative to 1981-20002°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-20003°C global warming level changeTemperature change (°C) relative to 1981-20004°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Summer Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Summer Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?These data contain a field for each warming level and the 1981-2000 baseline. They are named 'tas summer change' (change in air 'temperature at surface'), the warming level or baseline, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas summer change 2.0 median' is the median value for summer for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas summer change 2.0 median' is named 'tas_summer_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas summer change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Summer Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a de-identified summary table of vision and eye health data indicators from NHIS, stratified by all available combinations of age group, race/ethnicity, gender, and risk factor. NHIS is an annual household survey conducted by the National Center for Health Statistics at CDC that monitors trends in illness, disabilities, and progress towards national health objectives. Approximate sample size is 35,000 households and 87,500 persons annually. NHIS data for VEHSS includes questions related to Visual Function. Data were suppressed for cell sizes less than 30 persons, or where the relative standard error more than 30% of the mean. Data will be updated as it becomes available. Detailed information on VEHSS NHIS analyses can be found on the VEHSS NHIS webpage (link). Additional information about NHIS can be found on the NHIS website (http://www.cdc.gov/nchs/nhis/about_nhis.htm). The VEHSS NHIS dataset was last updated in November 2019.
|Column Name|Description| Type| |--|--|--| YearStart|Starting year for year range|Number| YearEnd|Ending year for year range. Same as starting year if single year used in evaluation.|Number| LocationAbbr|Location abbreviation|Plain Text| LocationDesc|Location full name|Plain Text| DataSource|Abbreviation of Data Source|Plain Text| Topic|Topic description|Plain Text| Category|Category description|Plain Text| Question|Question description (e.g., Percentage of adults with diabetic retinopathy)|Plain Text| Response|Optional column to hold the response value that was evaluated.|Plain Text| Age|Stratification value for age group (e.g., All ages, 0-17 years, 18-39 years, 40-64 years, 65-84 years, or 85 years and older)|Plain Text| Gender|Stratification value for gender (e.g., Total, Male, or Female)|Plain Text| RaceEthnicity|Stratification value for race (e.g., All races, Asian, Black, non-hispanic, Hispanic, any race, North American Native, White, non-hispanic, or Other)|Plain Text| RiskFactor|Stratification value for major risk factor (e.g., All Participants, Diabetes, Hypertension, Smoking)|Plain Text| RiskFactorResponse|Column holding the response for the risk factor that was evaluated (e.g., All Participants, Borderline, Current Smoker, Former Smoker, Never Smoker, Yes, or No)|Plain Text| Data_Value_Unit|The unit, such as "%" for percent|Plain Text| Data_Value_Type|The data value type, such as age-adjusted prevalence or crude prevalence|Plain Text| Data_Value|A numeric data value greater than or equal to 0, or no value when footnote symbol and text are present|Number| Data_Value_Footnote_Symbol|Footnote symbol|Plain Text| Data_Value_Footnote|Footnote text|Plain Text| Low_Confidence_Limit|95% confidence interval lower bound|Number| High_Confidence_Limit|95% confidence interval higher bound|Number| Numerator|The prediction of the number of people who may have this condition in the state/country (n)|Number| Sample_Size|Sample size used to calculate the data value|Number| LocationID|Lookup identifier value for the location|Plain Text| TopicID|Lookup identifier for the Topic|Plain Text| CategoryID|Lookup identifier for the Category|Plain Text| QuestionID|Lookup identifier for the Question|Plain Text| ResponseID|Lookup identifier for the Response|Plain Text| DataValueTypeID|Lookup identifier for the data value type|Plain Text| AgeID|Lookup identifier for the Age stratification|Plain Text| GenderID|Lookup identifier for the Gender stratification|Plain Text| RaceEthnicityID|Lookup identifier for the Race/Ethnicity stratification|Plain Text| RiskFactorID|Lookup identifier for the Major Risk Factor|Plain Text| RiskFactorResponseID|Lookup identifier for the Major Risk Factor Response|Plain Text| GeoLocation|No Geolocation is provided for national data|Location| Geographic Level||Plain Text|
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.2.]What does the data show? The Annual Count of Hot Summer Days is the number of days per year where the maximum daily temperature is above 30°C. It measures how many times the threshold is exceeded (not by how much) in a year. Note, the term ‘hot summer days’ is used to refer to the threshold and temperatures above 30°C outside the summer months also contribute to the annual count. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Hot Summer Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of hot summer days to previous values.What are the possible societal impacts?The Annual Count of Hot Summer Days indicates increased health risks, transport disruption and damage to infrastructure from high temperatures. It is based on exceeding a maximum daily temperature of 30°C. Impacts include:Increased heat related illnesses, hospital admissions or death.Transport disruption due to overheating of railway infrastructure. Overhead power lines also become less efficient. Other metrics such as the Annual Count of Summer Days (days above 25°C), Annual Count of Extreme Summer Days (days above 35°C) and the Annual Count of Tropical Nights (where the minimum temperature does not fall below 20°C) also indicate impacts from high temperatures, however they use different temperature thresholds.What is a global warming level?The Annual Count of Hot Summer Days is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Count of Hot Summer Days, an average is taken across the 21 year period. Therefore, the Annual Count of Hot Summer Days show the number of hot summer days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named ‘HSD’ (where HSD means Hot Summer Days), the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Hot Summer Days 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Hot Summer Days 2.5 median’ is ‘HotSummerDays_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘HSD 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Hot Summer Days was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 1.2.]What does the data show? The Annual Count of Frost Days is the number of days per year where the minimum daily temperature is below 0°C. It measures how many times the threshold is exceeded (not by how much) in a year. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Frost Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of frost days to previous values. What are the possible societal impacts?The Annual Count of Frost Days indicates increased cold weather disruption due to a higher than normal chance of ice and snow. It is based on the minimum daily temperature being below 0°C. Impacts include:Damage to crops.Transport disruption.Increased energy demand.The Annual Count of Icing Days, is a similar metric measuring impacts from cold temperatures, it indicates more severe cold weather impacts.What is a global warming level?The Annual Count of Frost Days is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Count of Frost Days, an average is taken across the 21 year period. Therefore, the Annual Count of Frost Days show the number of frost days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named ‘Frost Days’, the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Frost Days 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Frost Days 2.5 median’ is ‘FrostDays_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘Frost Days 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Frost Days was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Facebook
TwitterThis dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
## Example questions
Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
Answer: 4
Question: Calculate -841880142.544 + 411127.
Answer: -841469015.544
Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
Answer: 54*a - 30
It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories: