Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Study information The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset. All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders. The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point. The number of measurement points were distributed across participants as follows: Participant 1 – 3 baseline, 6 treatment, 1 post-treatment Participant 3 – 2 baseline, 7 treatment, 1 post-treatment Participant 5 – 2 baseline, 5 treatment, 1 post-treatment Participant 6 – 3 baseline, 4 treatment, 1 post-treatment Participant 7 – 2 baseline, 5 treatment, 1 post-treatment In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.
Measures Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.
Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.
Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.
Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.
The Number Line Intervention During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.
Variables in the dataset Age = age in ‘years, months’ at the start of the study Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents) Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.
The second part of the variable name refers to the task, as follows: DC = dot comparison SDC = single-digit computation NLE_UT = number line estimation (untrained set) NLE_T= number line estimation (trained set) CE = multidigit computational estimation NC = number comparison The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).
Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
Facebook
TwitterThis dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
## Example questions
Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
Answer: 4
Question: Calculate -841880142.544 + 411127.
Answer: -841469015.544
Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
Answer: 54*a - 30
It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:
Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterThere are several datasets for simple linear regression algorithm. But most of them are random datasets, though there is no problem with that, but I think that it is very important that the data you are working on no matter how small or simple that problem or algorithm is, should be meaningful. Hence, here is a bit sensible data of random numbers and their corresponding log base 10 values. You can use this dataset to practice and play around with Linear Regression Algorithm.
The dataset consists of two CSVs corresponding to the training and testing dataset. The training dataset was created in Google Spreadsheet using the RANDBETWEEN(1,1000) function to generate pseudo-random values. Then, LOG10() function was used to calculate the log base 10 value of each of these numbers. Afterward, these log values were truncated to 6 decimal points using TRUNC() formula. The testing dataset was created in the same way as the train dataset but the range of numbers were between 1001, and 2000.
I would like to thanks Dr. Andrew Ng for creating an amazing beginner-friendly ML course.
I hope this dataset helps Machine Learning beginners and newbies to practice and learn about Linear Regression.
Facebook
TwitterThe exercise after this contains questions that are based on the housing dataset.
How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173
How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161
How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92
What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000
For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.
What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features
If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.
If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above
If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the calculation folder: each file contains a matrix called “MATR”. Each row of the matrix “MATR” is a trial.
The columns contain the following information:
1st: Number of trial
2nd: Subject response
4th: Response time
5th: first number
6th: math symbol (1=*; 2= +; 3= –)
7th: second number
8th: third number
In the calculation folder: each file contains a matrix called “matr”. Each row of the matrix “matr” is a trial.
The columns contain the following information:
1st: subject response in the numerosity task
2nd: the presented numerosity
3rd: subject response in the numerosity task
4th: zero
5th: stimulus duration
6th: Response time in the numerosity task
7th: Grouped (1) or random (2) presentation
8th: 1
9th: 1
10th: Number of items of the upper-left quadrant
11th: Number of items of the lower-left quadrant
12th: Number of items of the upper-right quadrant
13th: Number of items of the lower - right quadrant
14th: odd shape presented (1=diamond; 2=triangle; 3=circle)
15th: subject response in the shape task
16th: 0.2 in the single task response time in the shape task when dual task
17th: single (0) or dual (1) task
18th: time stimulus on
19th: time stimulus off
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GeoMAD is the Digital Earth Africa (DE Africa) surface reflectance geomedian and triple Median Absolute Deviation data service. It is a cloud-free composite of satellite data compiled over specific timeframes. This service is ideal for longer-term time series analysis, cloudless imagery and statistical accuracy.
GeoMAD has two main components: Geomedian and Median Absolute Deviations (MADs)
The geomedian component combines measurements collected over the specified timeframe to produce one representative, multispectral measurement for every pixel unit of the African continent. The end result is a comprehensive dataset that can be used to generate true-colour images for visual inspection of anthropogenic or natural landmarks. The full spectral dataset can be used to develop more complex algorithms.
For each pixel, invalid data is discarded, and remaining observations are mathematically summarised using the geomedian statistic. Flyover coverage provided by collecting data over a period of time also helps scope intermittently cloudy areas.
Variations between the geomedian and the individual measurements are captured by the three Median Absolute Deviation (MAD) layers. These are higher-order statistical measurements calculating variation relative to the geomedian. The MAD layers can be used on their own or together with geomedian to gain insights about the land surface and understand change over time.Key PropertiesGeographic Coverage: Continental Africa - approximately 37° North to 35° SouthTemporal Coverage: 2017 – 2022*Spatial Resolution: 10 x 10 meterUpdate Frequency: Annual from 2017 - 2022Product Type: Surface Reflectance (SR)Product Level: Analysis Ready (ARD)Number of Bands: 14 BandsParent Dataset: Sentinel-2 Level-2A Surface ReflectanceSource Data Coordinate System: WGS 84 / NSIDC EASE-Grid 2.0 Global (EPSG:6933)Service Coordinate System: WGS 84 / NSIDC EASE-Grid 2.0 Global (EPSG:6933)*Time is enabled on this service using UTC – Coordinated Universal Time. To assure you are seeing the correct year for each annual slice of data, the time zone must be set specifically to UTC in the Map Viewer settings each time this layer is opened in a new map. More information on this setting can be found here: Set the map time zone.ApplicationsGeoMAD is the Digital Earth Africa (DE Africa) surface reflectance geomedian and triple Median Absolute Deviation data service. It is a cloud-free composite of satellite data compiled over specific timeframes. This service is ideal for:Longer-term time series analysisCloud-free imageryStatistical accuracyAvailable BandsBand IDDescriptionValue rangeData typeNo data valueB02Geomedian B02 (Blue)1 - 10000uint160B03Geomedian B03 (Green)1 - 10000uint160B04Geomedian B04 (Red)1 - 10000uint160B05Geomedian B05 (Red edge 1)1 - 10000uint160B06Geomedian B06 (Red edge 2)1 - 10000uint160B07Geomedian B07 (Red edge 3)1 - 10000uint160B08Geomedian B08 (Near infrared (NIR) 1)1 - 10000uint160B8AGeomedian B8A (NIR 2)1 - 10000uint160B11Geomedian B11 (Short-wave infrared (SWIR) 1)1 - 10000uint160B12Geomedian B12 (SWIR 2)1 - 10000uint160SMADSpectral Median Absolute Deviation0 - 1float32NaNEMADEuclidean Median Absolute Deviation0 - 31623float32NaNBCMADBray-Curtis Median Absolute Deviation0 - 1float32NaNCOUNTNumber of clear observations1 - 65535uint160Bands can be subdivided as follows:
Geomedian — 10 bands: The geomedian is calculated using the spectral bands of data collected during the specified time period. Surface reflectance values have been scaled between 1 and 10000 to allow for more efficient data storage as unsigned 16-bit integers (uint16). Note parent datasets often contain more bands, some of which are not used in GeoMAD. The geomedian band IDs correspond to bands in the parent Sentinel-2 Level-2A data. For example, the Annual GeoMAD band B02 contains the annual geomedian of the Sentinel-2 B02 band. Median Absolute Deviations (MADs) — 3 bands: Deviations from the geomedian are quantified through median absolute deviation calculations. The GeoMAD service utilises three MADs, each stored in a separate band: Euclidean MAD (EMAD), spectral MAD (SMAD), and Bray-Curtis MAD (BCMAD). Each MAD is calculated using the same ten bands as in the geomedian. SMAD and BCMAD are normalised ratios, therefore they are unitless and their values always fall between 0 and 1. EMAD is a function of surface reflectance but is neither a ratio nor normalised, therefore its valid value range depends on the number of bands used in the geomedian calculation.Count — 1 band: The number of clear satellite measurements of a pixel for that calendar year. This is around 60 annually, but doubles at areas of overlap between scenes. “Count” is not incorporated in either the geomedian or MADs calculations. It is intended for metadata analysis and data validation.ProcessingAll clear observations for the given time period are collated from the parent dataset. Cloudy pixels are identified and excluded. The geomedian and MADs calculations are then performed by the hdstats package. Annual GeoMAD datasets for the period use hdstats version 0.2.More details on this dataset can be found here.
Facebook
TwitterPHREEQCI is a widely-used geochemical computer program that can be used to calculate chemical speciation and specific conductance of a natural water sample from its chemical composition (Charlton and Parkhurst, 2002; Parkhurst and Appelo, 1999). The specific conductance of a natural water calculated with PHREEQCI (Appelo, 2010) is reliable for pH greater than 4 and temperatures less than 35 °C (McCleskey and others, 2012b). An alternative method for calculating the specific conductance of natural waters is accurate over a large range of ionic strength (0.0004–0.7 mol/kg), pH (1–10), temperature (0–95 °C), and specific conductance (30–70,000 μS/cm) (McCleskey and others, 2012a). PHREEQCI input files for calculating the specific conductance of natural waters using the method described by McCleskey and others (2012a) have been created and are presented in this ScienceBase software release. The input files also incorporate three commonly used temperature compensation factors which can be used to determine the specific conductance at 25 °C: the constant (0.019), the non-linear (ISO-7888), and the temperature compensation factor described by McCleskey (2013) which is the most accurate for acidic waters (pH < 4). The specific conductance imbalance (SCI), which can be used along with charge balance as a quality-control check (McCleskey and others, 2012a), is also calculated: SCI (%) = 100 x (SC25 calculated – SC25 measured) / (SC25 measured) where SC25 calculated is the calculated specific conductance at 25 °C and SC25 measured is the measured specific conductance at 25 °C. Finally, the transport number (t), which is the relative contribution of a given ion to the overall electrical conductivity, for 30 ions is also calculated. Transport numbers are useful for interpreting specific conductance data and identify the ions that substantially contribute to the specific conductance. References Cited Appelo, C. A. J. 2017. Specific conductance: how to calculate, to use, and the pitfalls, [http://www.hydrochemistry.eu/exmpls/sc.html] Ball, J.W., and Nordstrom, D.K., 1991, User's manual for WATEQ4F, with revised thermodynamic data base and test cases for calculating speciation of major, trace, and redox elements in natural waters: U.S. Geological Survey Open-File Report 91-0183, p. 193. Charlton, S.R., and Parkhurst, D.L., 2002, PhreeqcI--A graphical user interface to the geochemical model PHREEQC: U.S. Geological Survey Fact Sheet FS-031-02, 2 p. McCleskey, R.B., Nordstrom, D.K., Ryan, J.N., and Ball, J.W., 2012a, A New Method of Calculating Electrical Conductivity With Applications to Natural Waters: Geochimica et Cosmochimica Acta, v. 77, p. 369-382. [http://www.sciencedirect.com/science/article/pii/S0016703711006181] McCleskey, R.B., Nordstrom, D.K., and Ryan, J.N. 2012b, Comparison of electrical conductivity calculation methods for natural waters. Limnology and Oceanography: Methods, v.10, p 952-967. [http://aslo.org/lomethods/free/2012/0952.html] McCleskey, R.B., 2013, New Method for Electrical Conductivity Temperature Compensation: Environmental Science & Technology, v. 47, p. 9874-9881. [http://dx.doi.org/10.1021/es402188r] Parkhurst, D.L., and Appelo, C.A.J., 1999, User's guide to PHREEQC (Version 2)--a computer program for speciation, batch-reaction, one-dimensional transport, and inverse geochemical calculations: U.S. Geological Survey Water- Resources Investigations Report 99-4259, 312 p.
Facebook
TwitterPlease note, this dataset has been superseded by a newer version (see below). Users should not use this version except in rare cases (e.g., when reproducing previous studies that used this version). USCRN "Processed" Data (labeled as "uscrn-processed"): are interpreted values and derived geophysical parameters with other quality indicators processed from raw data (both Datalogger files and/or Raw Data from GOES and NOAAPort) by the USCRN Team. Climate variable types include air temperature, precipitation, soil moisture, soil temperature, surface temperature, wetness, global solar radiation, relative humidity, and wind at 1.5 m above the ground. Many additional engineering variables are also available. These data have been decoded, quality-flagged, and processed into level 1 hourly data (the only applied quality control is rounding some values as they enter the database), and includes additional calculated values such as precipitation (5-minute and hourly), hourly maximum temperature, hourly minimum temperature, average temperature (5-minute and hourly), soil moisture (volumetric water content, 5-minute values at the 5 cm depth and and hourly values at all depths) for all dielectric values in range, layer average soil moisture (5 minute and hourly), and layer average soil temperature (5 minute and hourly). It is the general practice of USCRN to not calculate derived variables if the input data to these calculations are flagged. These data records are versioned based on the processing methods and algorithms used for the derivations (versions are noted within the data netCDF file), and data are updated when the higher quality raw data become available from stations' datalogger storage (Datalogger Files).
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.6.]What does the data show? The Annual Count of Summer Days is the number of days per year where the maximum daily temperature (the hottest point in the day) is above 25°C. It measures how many times the threshold is exceeded (not by how much) in a year. Note, the term ‘summer days’ is used to refer to the threshold and temperatures above 25°C outside the summer months also contribute to the annual count. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Summer Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of summer days to previous values. What are the possible societal impacts?An increase in the Annual Count of Summer Days indicates increased health risks from high temperatures. Impacts include:Increased heat related illnesses, hospital admissions or death for vulnerable people.Transport disruption due to overheating of railway infrastructure. Periods of increased water demand.Other metrics such as the Annual Count of Hot Summer Days (days above 30°C), Annual Count of Extreme Summer Days (days above 35°C) and the Annual Count of Tropical Nights (where the minimum temperature does not fall below 20°C) also indicate impacts from high temperatures, however they use different temperature thresholds.What is a global warming level?The Annual Count of Summer Days is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the 'Annual Count of Summer Days', an average is taken across the 21 year period. Therefore, the Annual Count of Summer Days show the number of summer days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data? This data contains a field for each global warming level and two baselines. They are named ‘Summer Days’, the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Summer Days 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Summer Days 2.5 median’ is ‘SummerDays_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘Summer Days 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Summer Days was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal
Facebook
TwitterUSCRN Processed data are interpreted values and derived geophysical parameters processed from raw data by the USCRN Team. Data were interpreted and ingested into a database, but are also available as netCDF files exported from the database. The "Version 2" of the dataset denotes the application of a new precipitation algorithm to calculate values. (See documentation for more information). Climate variable types include air temperature, precipitation, soil moisture, soil temperature, surface temperature, wetness, global solar radiation, relative humidity, and wind at 1.5 m above the ground. Many additional engineering variables are also available. These data have been decoded from the raw data streams, quality-flagged, and processed into level 1 hourly data (the only applied quality control is rounding some values as they enter the database), and includes additional calculated values such as precipitation (5-minute and hourly), hourly max/min temperature, average temperature (5-minute and hourly), soil moisture (volumetric water content, 5-minute values at the 5 cm depth and and hourly values at all depths) for all dielectric values in range, layer average soil moisture (5 minute and hourly), and layer average soil temperature (5 minute and hourly). It is the general practice of USCRN to not calculate derived variables if the input data to these calculations are flagged.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Low, Slow, and Small Target Detection Dataset for Digital Array Surveillance Radar (LSS-DAUR-1.0) includes a total of 154 items of Range-Doppler (RD) complex data and Track (TR) point data collected from 6 types of targets (passenger ships, speedboats, helicopters, rotary-wing UAVs, birds, fixed-wing UAVs). It can support research on detection, classification and recognition of typical maritime targets by digital array radar. 1. Data Collection Process The data collection process mainly includes: Set radar parameters → Detect targets → Collect echo signal data → Record target information → Determine the range bin where the target is located → Extract target Doppler data → Extract target track data. 2. Target Situation The collected typical sea-air targets include 6 categories: passenger ships, speedboats, helicopters, rotary-wing UAVs, birds and fixed-wing UAVs. 3. Range-Doppler (RD) Complex Data By calculating the target range, the echo data of the range bin where the target is located is intercepted. Based on the collected measured data, the Low, Slow, and Small Target RD Dataset for Digital Array Surveillance Radar is constructed, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes the target's Doppler, GPS time, frame count, etc. The naming method of target RD data is: Start Collection Time_DAUR_RD_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_RD_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "RD" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number. 4. Track (TR) Data Extract the track data within the time period of the echo data, and construct the Low, Slow, and Small Target TR Dataset for Digital Array Surveillance Radar, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes target range, target azimuth, elevation angle, target speed, GPS time, signal-to-noise ratio (SNR), etc. The TR data and RD data have the same time and batch number, and they are data of different dimensions for the same target in the same time period. The naming method of target TR data is: Start Collection Time_DAUR_TR_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_TR_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "TR" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information
The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.
The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.
Structure and content of the dataset
|
ChEMBL ID |
PubChem ID |
IUPHAR ID | Target |
Activity type | Assay type | Unit | Mean C (0) | ... | Mean PC (0) | ... | Mean B (0) | ... | Mean I (0) | ... | Mean PD (0) | ... | Activity check annotation | Ligand names | Canonical SMILES C | ... | Structure check | Source |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.
Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.
Column content:
Facebook
TwitterThis is the Baltic and North Sea Climatology (BNSC) for the Baltic Sea and the North Sea in the range 47 ° N to 66 ° N and 15 ° W to 30 ° E. It is the follow-up project to the KNSC climatology. The climatology was first made available to the public in March 2018 by ICDC and is published here in a slightly revised version 2. It contains the monthly averages of mean air pressure at sea level, and air temperature, and dew point temperature at 2 meter height. It is available on a 1 ° x 1 ° grid for the period from 1950 to 2015. For the calculation of the mean values, all available quality-controlled data of the DWD (German Meteorological Service) of ship observations and buoy measurements were taken into account during this period. Additional dew point values were calculated from relative humidity and air temperature if available. Climatologies were calculated for the WMO standard periods 1951-1980, 1961-1990, 1971-2000 and 1981-2010 (monthly mean values). As a prerequisite for the calculation of the 30-year-climatology, at least 25 out of 30 (five-sixths) valid monthly means had to be present in the respective grid box. For the long-term climatology from 1950 to 2015, at least four-fifths valid monthly means had to be available. Two methods were used (in combination) to calculate the monthly averages, to account for the small number of measurements per grid box and their uneven spatial and temporal distribution: 1. For parameters with a detectable annual cycle in the data (air temperature, dew point temperature), a 2nd order polynomial was fitted to the data to reduce the variation within a month and reduce the uncertainty of the calculated averages. In addition, for the mean value of air temperature, the daily temperature cycle was removed from the data. In the case of air pressure, which has no annual cycle, in version 2 per month and grid box no data gaps longer than 14 days were allowed for the calculation of a monthly mean and standard deviation. This method differs from KNSC and BNSC version 1, where mean and standard deviation were calculated from 6-day windows means. 2. If the number of observations fell below a certain threshold, which was 20 observations per grid box and month for the air temperature as well as for the dew point temperature, and 500 per box and month for the air pressure, data from the adjacent boxes was used for the calculation. The neighbouring boxes were used in two steps (the nearest 8 boxes, and if the number was still below the threshold, the next sourrounding 16 boxes) to calculate the mean value of the center box. Thus, the spatial resolution of the parameters is reduced at certain points and, instead of 1 ° x 1 °, if neighboring values are taken into account, data from an area of 5 ° x 5 ° can also be considered, which are then averaged into a grid box value. This was especially used for air pressure, where the 24 values of the neighboring boxes were included in the averaging for most grid boxes. The mean value, the number of measurements, the standard deviation and the number of grid boxes used to calculate the mean values are available as parameters in the products. The calculated monthly and annual means were allocated to the centers of the grid boxes: Latitudes: 47.5, 48.5, ... Longitudes: -14.5, -13.5, … In order to remove any existing values over land, a land-sea mask was used, which is also provided in 1 ° x 1 ° resolution. In this version 2 of the BNSC, a slightly different database was used, than for the KNSC, which resulted in small changes (less than 1 K) in the means and standard deviations of the 2-meter air temperature and dew point temperature. The changes in mean sea level pressure values and the associated standard deviations are in the range of a few hPa, compared to the KNSC. The parameter names and units have been adjusted to meet the CF 1.6 standard.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of computation fluid dynamics simulation of a pipette run in COMSOL Multiphysics.
Unprocessed data.
This dataset is available for download: Pipette_Simulation_Data.zip (300 MB)
Figure 2. Reynolds number: Particle positions for capture regions calculated at a range of Reynolds numbers.
Capture regions for a CFD simulation of flow into a pipette.
Re: 0.5, 1, 2, 4, 8, 16, 32, 33, 64, 128, 256, 384, 512, 768, 1024
T: selected to give equal volume for each Re (T* = 20).
Cylinder geometry in pipe inner diameters
Rad: 2
Bot: 16
Figure 3a. Cylinder geometry 1/2: Particle positions for capture regions calculated for different cylinder geometries.
Capture regions for a CFD simulation of flow into a pipette.
Re: 200
T: 2889/290 s (T* = 20)
Cylinder geometry in pipe inner diameters
Rad: 2, 2.25, 2.5, 3, 3.25, 3.5, 3.75, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Bot: 2, 16
Figure 3b. Cylinder geometry 2/2: Particle positions for capture regions calculated for different cylinder geometries.
Capture regions for a CFD simulation of flow into a pipette.
Re: 200
T: 2889/290 s (T* = 20)
Cylinder geometry in pipe inner diameters
Rad: 2, 2.25, 2.5, 3, 3.25, 3.5, 3.75, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Bot: 2, 16
Figure 4. Time: Particle positions for capture regions calculated over time
Capture regions for a CFD simulation of flow into a pipette.
Re: 200
T: 0:1:60 s
Rad: 2, 16
Bot: 2, 16
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.13°C.]What does the data show? This dataset shows the change in annual temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Note, as the values in this dataset are averaged over a year they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare annual average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.
PeriodDescription 1981-2000 baselineAverage temperature (°C) for the period 2001-2020 (recent past)Average temperature (°C) for the period 2001-2020 (recent past) changeTemperature change (°C) relative to 1981-2000 1.5°C global warming level changeTemperature change (°C) relative to 1981-2000 2°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-2000 3°C global warming level changeTemperature change (°C) relative to 1981-2000 4°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Annual Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for the 1981-2000 baseline, 2001-2020 period and each warming level. They are named 'tas annual change' (change in air 'temperature at surface'), the warming level or historic time period, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas annual change 2.0 median' is the median value for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas annual change 2.0 median' is named 'tas_annual_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas annual change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Facebook
Twitter"Vector Ruggedness Measure (VRM) measures terrain ruggedness as the variation in three-dimensional orientation of grid cells within a neighborhood. Vector analysis is used to calculate the dispersion of vectors normal (orthogonal) to grid cells within the specified neighborhood. This method effectively captures variability in slope and aspect into a single measure. Ruggedness values in the output raster can range from 0 (no terrain variation) to 1 (complete terrain variation). Typical values for natural terrains range between 0 and about 0.4. VRM was adapted from a method first proposed by Hobson (1972). VRM appears to decouple terrain ruggedness from slope better than current ruggedness indices, such as TRI or LSRI. See Sappington et al. 2007, for further details" (Sappington 2012). After calculating raw VRM values we, binned the values into 6 classes using Jenks Natural Breaks for use in the Land Treatment Exploration Tool Similarity Index. Classes range from very low to very high.
Facebook
TwitterThis dataset provides monthly summaries of evapotranspiration (ET) data from OpenET v2.0 image collections for the period 2008-2023 for all National Watershed Boundary Dataset subwatersheds (12-digit hydrologic unit codes [HUC12s]) in the US that overlap the spatial extent of OpenET datasets. For each HUC12, this dataset contains spatial aggregation statistics (minimum, mean, median, and maximum) for each of the ET variables from each of the publicly available image collections from OpenET for the six available models (DisALEXI, eeMETRIC, geeSEBAL, PT-JPL, SIMS, SSEBop) and the Ensemble image collection, which is a pixel-wise ensemble of all 6 individual models after filtering and removal of outliers according to the median absolute deviation approach (Melton and others, 2022). Data are available in this data release in two different formats: comma-separated values (CSV) and parquet, a high-performance format that is optimized for storage and processing of columnar data. CSV files containing data for each 4-digit HUC are grouped by 2-digit HUCs for easier access of regional data, and the single parquet file provides convenient access to the entire dataset. For each of the ET models (DisALEXI, eeMETRIC, geeSEBAL, PT-JPL, SIMS, SSEBop), variables in the model-specific CSV data files include: -huc12: The 12-digit hydrologic unit code -ET: Actual evapotranspiration (in millimeters) over the HUC12 area in the month calculated as the sum of daily ET interpolated between Landsat overpasses -statistic: Max, mean, median, or min. Statistic used in the spatial aggregation within each HUC12. For example, maximum ET is the maximum monthly pixel ET value occurring within the HUC12 boundary after summing daily ET in the month -year: 4-digit year -month: 2-digit month -count: Number of Landsat overpasses included in the ET calculation in the month -et_coverage_pct: Integer percentage of the HUC12 with ET data, which can be used to determine how representative the ET statistic is of the entire HUC12 -count_coverage_pct: Integer percentage of the HUC12 with count data, which can be different than the et_coverage_pct value because the “count” band in the source image collection extends beyond the “et” band in the eastern portion of the image collection extent For the Ensemble data, these additional variables are included in the CSV files: -et_mad: Ensemble ET value, computed as the mean of the ensemble after filtering outliers using the median absolute deviation (MAD) -et_mad_count: The number of models used to compute the ensemble ET value after filtering for outliers using the MAD -et_mad_max: The maximum value in the ensemble range, after filtering for outliers using the MAD -et_mad_min: The minimum value in the ensemble range, after filtering for outliers using the MAD -et_sam: A simple arithmetic mean (across the 6 models) of actual ET average without outlier removal Below are the locations of each OpenET image collection used in this summary: DisALEXI: https://developers.google.com/earth-engine/datasets/catalog/OpenET_DISALEXI_CONUS_GRIDMET_MONTHLY_v2_0 eeMETRIC: https://developers.google.com/earth-engine/datasets/catalog/OpenET_EEMETRIC_CONUS_GRIDMET_MONTHLY_v2_0 geeSEBAL: https://developers.google.com/earth-engine/datasets/catalog/OpenET_GEESEBAL_CONUS_GRIDMET_MONTHLY_v2_0 PT-JPL: https://developers.google.com/earth-engine/datasets/catalog/OpenET_PTJPL_CONUS_GRIDMET_MONTHLY_v2_0 SIMS: https://developers.google.com/earth-engine/datasets/catalog/OpenET_SIMS_CONUS_GRIDMET_MONTHLY_v2_0 SSEBop: https://developers.google.com/earth-engine/datasets/catalog/OpenET_SSEBOP_CONUS_GRIDMET_MONTHLY_v2_0 Ensemble: https://developers.google.com/earth-engine/datasets/catalog/OpenET_ENSEMBLE_CONUS_GRIDMET_MONTHLY_v2_0
Facebook
Twitter[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.09°C.]What does the data show? This dataset shows the change in summer average temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Here, summer is defined as June-July-August. Note, as the values in this dataset are averaged over a season they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged over the summer period to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare summer average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.PeriodDescription1981-2000 baselineAverage temperature (°C) for the period2001-2020 (recent past)Average temperature (°C) for the period2001-2020 (recent past) changeTemperature change (°C) relative to 1981-20001.5°C global warming level changeTemperature change (°C) relative to 1981-20002°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-20003°C global warming level changeTemperature change (°C) relative to 1981-20004°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Summer Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Summer Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?These data contain a field for each warming level and the 1981-2000 baseline. They are named 'tas summer change' (change in air 'temperature at surface'), the warming level or baseline, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas summer change 2.0 median' is the median value for summer for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas summer change 2.0 median' is named 'tas_summer_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas summer change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Summer Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Study information The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset. All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders. The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point. The number of measurement points were distributed across participants as follows: Participant 1 – 3 baseline, 6 treatment, 1 post-treatment Participant 3 – 2 baseline, 7 treatment, 1 post-treatment Participant 5 – 2 baseline, 5 treatment, 1 post-treatment Participant 6 – 3 baseline, 4 treatment, 1 post-treatment Participant 7 – 2 baseline, 5 treatment, 1 post-treatment In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.
Measures Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.
Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.
Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.
Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.
The Number Line Intervention During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.
Variables in the dataset Age = age in ‘years, months’ at the start of the study Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents) Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.
The second part of the variable name refers to the task, as follows: DC = dot comparison SDC = single-digit computation NLE_UT = number line estimation (untrained set) NLE_T= number line estimation (trained set) CE = multidigit computational estimation NC = number comparison The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).
Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.