Facebook
Twitter
The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.
All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.
The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.
The number of measurement points were distributed across participants as follows:
Participant 1 – 3 baseline, 6 treatment, 1 post-treatment
Participant 3 – 2 baseline, 7 treatment, 1 post-treatment
Participant 5 – 2 baseline, 5 treatment, 1 post-treatment
Participant 6 – 3 baseline, 4 treatment, 1 post-treatment
Participant 7 – 2 baseline, 5 treatment, 1 post-treatment
In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.
Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.
Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.
Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.
Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.
During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.
Age = age in ‘years, months’ at the start of the study
Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)
Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.
The second part of the variable name refers to the task, as follows:
DC = dot comparison
SDC = single-digit computation
NLE_UT = number line estimation (untrained set)
NLE_T= number line estimation (trained set)
CE = multidigit computational estimation
NC = number comparison
The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).
Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "
The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".
Files and variables
File 1: Data_and_Code.zip
Directory: Main_function
**Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.
MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.
Script_1_Ice_velocity_process_flow.m
Script_2_strain_rate_process_flow.m
Script_3_DROT_grounding_line_extraction.m
Script_4_Read_ICESat2_h5_files.m
Script_5_Extraction_results.m
MATLAB functions: Five Files that includes MATLAB functions that support the main script:
1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.
2_strain_rate: Include MATLAB functions related to strain rate calculation.
3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.
4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.
5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).
6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.
Director: data_and_result
Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.
1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.
2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).
3_rockpoint: Extract velocities at non-moving region
4_constant_detrend: removed orbit error
5_Tidal_correction: remove atmospheric and tidal induced error
6_rockpoint: Extract non-aggregated velocities at non-moving region
6_vx_vy_v: trasform velocities from va/vr to vx/vy
7_rockpoint: Extract aggregated velocities at non-moving region
7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps
8_strain_rate: calculated strain rate from aggregate ice velocity
9_compare: store the results before and after tidal correction and aggregation.
10_Block_result: times series results that extrac from 2D data.
11_MALAB_output_png_result: Store .png files and time serties result
12_DROT: Differential Range Offset Tracking results
13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)
14_MODIS_images: you can store MODIS images here
shp: grounding line, rock region, ice front, and other shape files.
File 2 : PIG_front_1947_2023.zip
Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.
File 3 : PIG_DROT_GL_2016_2021.zip
Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.
Data was derived from the following sources:
Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.
Facebook
TwitterOur mission with this project is to provide an always up-to-date and freely accessible map of the cloud landscape for every major cloud service provider.
We've decided to kick things off with collecting SSL certificate data of AWS EC2 machines, considering the value of this data to security researchers. However, we plan to expand the project to include more data and providers in the near future. Your input and suggestions are incredibly valuable to us, so please don't hesitate to reach out on Twitter or Discord and let us know what areas you think we should prioritize next!
You can find origin IP for an example: instacart.com, Just search there instacart.com
You can use command as well if you are using linux. Open the dataset using curl or wget and then **cd ** folder now run command: find . -type f -iname "*.csv" -print0 | xargs -0 grep "word"
Like: find . -type f -iname "*.csv" -print0 | xargs -0 grep "instacart.com"
Done, You will see output.
How can SSL certificate data benefit you? The SSL data is organized into CSV files, with the following properties collected for every found certificate:
IP Address Common Name Organization Country Locality Province Subject Alternative DNS Name Subject Alternative IP address Self-signed (boolean)
IP Address Common Name Organization Country Locality Province Subject Alternative DNS Name Subject Alternative IP address Self-signed 1.2.3.4 example.com Example, Inc. US San Francisco California example.com 1.2.3.4 false 5.6.7.8 acme.net Acme, Inc. US Seattle Washington *.acme.net 5.6.7.8 false So what can you do with this data?
Enumerate subdomains of your target domains Search for your target's domain names (e.g. example.com) and find hits in the Common Name and Subject Alternative Name fields of the collected certificates. All IP ranges are scanned daily and the dataset gets updated accordingly so you are very likley to find ephemeral hosts before they are taken down.
Enumerate domains of your target companies Search for your target's company name (e.g. Example, Inc.), find hits in the Organization field, and explore the associated Common Name and Subject Alternative Name fields. The results will probably include subdomains of the domains you're familiar with and if you're in luck you might find new root domains expanding the scope.
Enumerate possible sub-subdomain enumeration target If the certificate is issued for a wildcard (e.g. *.foo.example.com), chances are there are other subdomains you can find by brute-forcing there. And you know how effective of this technique can be. Here are some wordlists to help you with that!
💡 Note: Remeber to monitor the dataset for daily updates to get notified whenever a new asset comes up!
Perform IP lookups Search for an IP address (e.g. 3.122.37.147) to find host names associated with it, and explore the Common Name, Subject Alternative Name, and Organization fields to gain find more information about that address.
Discover origin IP addresses to bypass proxy services When a website is hidden behind security proxy services like Cloudflare, Akamai, Incapsula, and others, it is possible to search for the host name (e.g., example.com) in the dataset. This search may uncover the origin IP address, allowing you to bypass the proxy. We've discussed a similar technique on our blog which you can find here!
Get a fresh dataset of live web servers Each IP address in the dataset corresponds to an HTTPS server running on port 443. You can use this data for large-scale research without needing to spend time collecting it yourself.
Whatever else you can think of If you use this data for a cool project or research, we would love to hear about it!
Additionally, below you will find a detailed explanation of our data collection process and how you can implement the same technique to gather information from your own IP ranges.
TB; DZ (Too big; didn't zoom):
We kick off the workflow with a simple bash script that retrieves AWS's IP ranges. Using a JQ query, we extract the IP ranges of EC2 machines by filtering for .prefixes[] | select(.service=="EC2") | .ip_prefix. Other services are excluded from this workflow since they don't support custom SSL certificates, making their data irrelevant for our dataset.
Then, we use mapcidr to divide the IP ranges obtained in step 1 into smaller ranges, each containing up to 100k hosts (Thanks, ProjectDiscovery team!). This step will be handy in the next step when we run the parallel scanning process.
At the time of writing, the EC2 IP ranges include over 57 million IP addresses, so scanning them all on a single machine would be impractical, which is where our file-splitter node comes into play.
This node iterates through the input from mapcidr and triggers individual jobs for each range. When executing this w...
Facebook
Twitterhttps://api.github.com/licenses/mithttps://api.github.com/licenses/mit
This dataset contains Python numerical computation code for studying the phenomena of acoustic superluminescence and Hawking radiation in specific rotating acoustic black hole models. The code is based on the radial wave equation of scalar field (acoustic disturbance) under the effective acoustic metric background derived from analysis. Dataset generation process and processing methods: The core code is written in Python language, using standard scientific computing libraries NumPy and SciPy. The main steps include: (1) defining model parameters (such as A, B, m) and calculation range (frequency $\ omega $from 0.01 to 2.0, turtle coordinates $r ^ * $from -20 to 20); (2) Implement the mutual conversion function between the radial coordinate $r $and the turtle coordinate $r ^ * $, where the inversion of $r ^ * (r) $is numerically solved using SciPy's' optimize.root_scalar 'function (such as Brent's method), and special attention is paid to calculations near the horizon $r_H=| A |/c $to ensure stability; (3) Calculate the effective potential $V_0 (r ^ *, \ omega) $that depends on $r (r ^ *) $; (4) Convert the second-order radial wave equation into a system of quaternion first-order real valued ordinary differential equations; (5) The ODE system was solved using SciPy's' integrate. solve_ivp 'function (using an adaptive step size RK45 method with relative and absolute error margins set to $10 ^ {-8} $), applying pure inward boundary conditions (normalized unit transmission) at the field of view and asymptotic behavior at infinity; (6) Extract the reflection coefficient $\ mathcal {R} $and transmission coefficient $\ mathcal {T} $from the numerical solution; (7) Calculate the Hawking radiation power spectrum $P_ \ omega $based on the derived Hawking temperature $TH $, event horizon angular velocity $\ Omega-H $, Bose Einstein statistics, and combined with the gray body factor $| \ mathcal {T} | ^ 2 $. The calculation process adopts the natural unit system ($\ hbar=k_B=c=1 $) and sets the feature length $r_0=1 $. Dataset content: This dataset mainly includes a Python script file (code for numerical research on superluminescence and Hawking radiation of rotating acoustic black holes. py) and a README documentation file (README. md). The Python script implements the complete calculation process mentioned above. The README file provides a detailed explanation of the code's functionality, the required dependency libraries (Python 3, NumPy, SciPy) for running, the running methods, and the meaning of parameters. This dataset does not contain any raw experimental data and is only theoretical calculation code. Data accuracy and validation: The reliability of the code has been validated through two key indicators: (1) Flow conservation relationship$|\ mathcal{R}|^2 + [(\omega-m\Omega_H)/\omega]|\mathcal{T}|^2 = 1$ The numerical approximation holds within the calculated frequency range (with a deviation typically on the order of $10 ^ {-8} $or less); (2) Under the condition of superluminescence $0<\ omega1 $, which is consistent with theoretical expectations. File format and software: The code is in standard Python 3 (. py) format and can run in any standard Python 3 environment with NumPy and SciPy libraries installed. The README file is in Markdown (. md) format and can be opened with any text editor or Markdown viewer. No special or niche software is required.
Facebook
TwitterThe exercise after this contains questions that are based on the housing dataset.
How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173
How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161
How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92
What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000
For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.
What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features
If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.
If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above
If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of datasets used in the experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.
Key observations
Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Methods for solving the Schrödinger equation without approximation are in high demand but are notoriously computationally expensive. In practical terms, there are just three primary factors that currently limit what can be achieved: 1) system size/dimensionality; 2) energy level excitation; and 3) numerical convergence accuracy. Broadly speaking, current methods can deliver on any two of these three goals, but achieving all three at once remains an enormous challenge. In this paper, we shall demonstrate how to “hit the trifecta” in the context of molecular vibrational spectroscopy calculations. In particular, we compute the lowest 1000 vibrational states for the six-atom acetonitrile molecule (CH3CN), to a numerical convergence of accuracy 10–2 cm–1 or better. These calculations encompass all vibrational states throughout most of the dynamically relevant range (i.e., up to ∼4250 cm–1 above the ground state), computed in full quantum dimensionality (12 dimensions), to near spectroscopic accuracy. To our knowledge, no such vibrational spectroscopy calculation has ever previously been performed.
Facebook
TwitterCode: calculation of PRSL for both ‘coral’ and ‘range’ type vertical uncertainties, as well as U-series and radiocarbon ages [pdf of Matlab code file]Supplement to Hibbert et al., 2018 (Scientific Data)
Facebook
TwitterDeliverable D2.5 Datasets for each Pilot.
This deliverable is the result of joint efforts from the PHOENIX consortium. This executive report describes the purpose and content of the deliverable.
Deliverable D2.5 Datasets for each Pilot is exactly what the title suggests: The deliverable consists of a vast collection of environmental and socio-economic datasets from the EU at national and local levels of territories where local partners and pilots are operating. Please note that this deliverable consists of data only and there are no interpretations and analyses performed with this data, this step will be executed in other deliverables.
What data does the deliverable consist of? First, there is a dataset comprising over 100 indicators that give insight into a wide range of themes that are relevant to PHOENIX, including population structures, social-economic conditions, information about marginalised groups, energy poverty and environmental/ecological conditions in the respective territories. These data are mostly coming from large international databases such as Eurostat or national statistical databases. Second, the deliverable includes various relevant secondary datasets with information on current opinions, social attitudes, values and ‘green’ behaviours that are the product of international collaborations and initiatives such as the European Social Survey (ESS) and Eurobarometer. Third, the deliverable compiles data at local levels (cities and regions) collected from censuses of population, digital boundary data sources. This in order to understand dynamics on a much local scale and to provide input data for some of the next steps in the PHOENIX project (including spatial microsimulation, agent-based modelling and geo-visualisations).
In particular, Deliverable D2.5 is originally developed with the purpose to support WP3 and specifically task 3.3 to develop the Tangram’s methodologies and tools to investigate cornerstone democratic innovations and estimate their success in citizens’ readiness to change for climate change, and tailor and test these across pilots. Yet, the usability of these data are expected to be of interest across all work packages within the consortium.
In this report, the first chapter outlines a short description with instructions on how all the different datasets are organised and stored and how one can find, obtain and use the data. Chapters two to eight describe available data per territory where the various pilots will be operating. It will be apparent that there is overlap between these chapters, only with some different details about the data formats.
PHOENIX adheres to up-to date data management standards and regulations such as the GDPR, for details on our data management organisation and policies please inquire our dedicated deliverable D 1.2 Data Management Plan.
Overall, this deliverable provides the basis for following steps in PHOENIX and at the same time it is formatted in a way that can support all partners in their search for relevant information for their work packages and tasks (and pilot area case studies) that they work on.
Facebook
TwitterThis dataset contains additional "small" habitat cores that had a minimum size of 1 female marten home range (300ha), but were too small to meet the minimum size threshold of 5 female home ranges (1500ha) used to define cores in the Primary Model. The description following this paragraph was adapted from the the metadata description for developing cores in the Primary Model. These methods are identical to those used in developing cores in the Primary Model, with three exceptions: (1) The minimum habitat core size parameter used in the Core Mapper tool was set to 300ha instead of 1500ha, (2) the cores that were included in the Primary Model (i.e. cores ≥ 1500ha) were removed from this dataset, and (3) there were no manual modifications to the habitat cores as was described in the metadata on developing cores in the Primary Model, as they were not applicable for this supplementary core dataset. Thus, this dataset is a true supplement to the Habitat Cores dataset presented in the Primary Model, as there are no redundant cores included. It should be noted that a single core in this dataset actually slightly exceeded the 1500ha threshold for its final area calculation but was not present in the Primary Model set of habitat cores. We determined that this was because the "1500ha cutoff" in the tool was actually applied before the core was expanded by 977m to fill in interior holes and then subsequently trimmed back (In the Core Mapper tool, this is controlled by the "Expand cores by this CWD value" and "Trim back expanded cores" parameters). We derived the habitat cores using a tool within Gnarly Landscape Utilities called Core Mapper (Shirk and McRae 2015). To develop a Habitat Surface for input into Core Mapper, we started by assigning each 30m pixel on the modeled landscape a habitat value equal to its GNN OGSI (range = 0-100). In areas with serpentine soils that support habitat potentially suitable for coastal marten (see report for details), we assigned a minimum habitat value of 31, which is equivalent to the 33rd percentile of OGSI 80 pixels in the marten’s historical range. Pixels with higher OGSI retained their normal habitat value. Our intention was to allow the modified serpentine pixels to be more easily incorporated into habitat cores if there were higher value OGSI pixels in the vicinity, but not to have them form the entire basis of a core. We also excluded pixels with a habitat value <1.0 from inclusion in habitat cores. We then used a moving window to calculate the average habitat value within a 977m radius around each pixel (derived from the estimated average size of a female marten’s home range of 300 ha). Pixels with an average habitat value ≥36.0 were then incorporated into habitat cores. After conducting a sensitivity analysis by running a set of Core Mapper trials using a broad range of habitat values, we chose ≥36.0 as the average habitat value because it is the median OGSI of pixels within the marten’s historical range classified by the GNN as “OGSI 80” (Davis et al. 2015). It generated a set of habitat cores that were not overly generous (depicting most of the landscape as habitat core) or strict (only mapping cores in a few locations with very high OGSI). We then set Core Mapper to expand the habitat cores by 977 cost-weighted meters, a step intended to consolidate smaller cores that were probably relatively close together from a marten’s perspective. This was followed by a “trimming” step that removed pixels from the expansion that did not meet the moving window average so the net result was rather small changes in the size of the habitat cores, but filling in many individual isolated pixels with a habitat value of 0. This is an abbreviated and incomplete description of the dataset. Please refer to the spatial metadata for a more thorough description of the methods used to produce this dataset, and a discussion of any assumptions or caveats that should be taken into consideration.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By US Open Data Portal, data.gov [source]
This Kaggle dataset showcases the groundbreaking research undertaken by the GRACEnet program, which is attempting to better understand and minimize greenhouse gas (GHG) emissions from agro-ecosystems in order to create a healthier world for all. Through multi-location field studies that utilize standardized protocols – combined with models, producers, and policy makers – GRACEnet seeks to: typify existing production practices, maximize C sequestration, minimize net GHG emissions, and meet sustainable production goals. This Kaggle dataset allows us to evaluate the impact of different management systems on factors such as carbon dioxide and nitrous oxide emissions, C sequestration levels, crop/forest yield levels – plus additional environmental effects like air quality etc. With this data we can start getting an idea of the ways that agricultural policies may be influencing our planet's ever-evolving climate dilemma
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Step 1: Familiarize yourself with the columns in this dataset. In particular, pay attention to Spreadsheet tab description (brief description of each spreadsheet tab), Element or value display name (name of each element or value being measured), Description (detailed description), Data type (type of data being measured) Unit (unit of measurement for the data) Calculation (calculation used to determine a value or percentage) Format (format required for submitting values), Low Value and High Value (range for acceptable entries).
Step 2: Familiarize yourself with any additional information related to calculations. Most calculations made use of accepted best estimates based on standard protocols defined by GRACEnet. Every calculation was described in detail and included post-processing steps such as quality assurance/quality control changes as well as measurement uncertainty assessment etc., as available sources permit relevant calculations were discussed collaboratively between all participating partners at every level where they felt necessary. All terms were rigorously reviewed before all partners agreed upon any decision(s). A range was established when several assumptions were needed or when there was a high possibility that samples might fall outside previously accepted ranges associated with standard protocol conditions set up at GRACEnet Headquarters laboratories resulting due to other external factors like soil type, climate etc,.
Step 3: Determine what types of operations are allowed within each spreadsheet tab (.csv file). For example on some tabs operations like adding an entire row may be permitted but using formulas is not permitted since all non-standard manipulations often introduce errors into an analysis which is why users are encouraged only add new rows/columns provided it is seen fit for their specific analysis operations like fill blank cells by zeros or delete rows/columns made redundant after standard filtering process which have been removed earlier from different tabs should be avoided since these nonstandard changes create unverified extra noise which can bias your results later on during robustness testing processes related to self verification process thereby creating erroneous output results also such action also might result into additional FET values due API's specially crafted excel documents while selecting two ways combo box therefore
- Analyzing and comparing the environmental benefits of different agricultural management practices, such as crop yields and carbon sequestration rates.
- Developing an app or other mobile platform to help farmers find management practices that maximize carbon sequestration and minimize GHG emissions in their area, based on their specific soil condition and climate data.
- Building an AI-driven model to predict net greenhouse gas emissions and C sequestration from potential weekly/monthly production plans across different regions in the world, based on optimal allocation of resources such as fertilizers, equipment, water etc
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The files with simulation results for ECOC 20223 submission "Analysis of the Scalar and Vector Random Coupling Models For a Four Coupled-Core Fiber". "4CCF_eigenvectorsPol" file is the Mathematica code which enables to calculate supermodes (eigenvectors of M(w)) and their propagation constants of 4-coupled-core fiber (4CCF). These results are uploaded to the python notebook "4CCF_modelingECOC" in order to plot them to get Fig. 2 in the paper. "TransferMatrix" is the python file with functions used for modeling, simulation and plotting. It is also uploaded in the python notebook "4CCF_modelingECOC", where all the calculations for figures in the paper are presented.
! UPD 25.09.2023: There is an error in the formula of birefringence calculation. It is in the function "CouplingCoefficients" in "TransferMatrix" file. There the variable "birefringence" has to be calculated according to the formula (19) [A. Ankiewicz, A. Snyder, and X.-H. Zheng, "Coupling between parallel optical fiber cores–critical examination", Journal of Lightwave Technology, vol. 4, no. 9,pp. 1317–1323, 1986]: (4*U**2*W*spec.k0(W)*spec.kn(2, W_)/(spec.k1(W)*V**4))*((spec.iv(1, W)/spec.k1(W))-(spec.iv(2, W)/spec.k0(W))) The correct formula gives almost the same result (the difference is 10^-5), but one has to use a correct formula anyway. ! UPD 9.12.2023: I have noticed that in the published version of the code I forgot to change the wavelength range for impulse response calculation. So instead of seeing the nice shape as in the paper you will see resolution limited shape. To solve that just change the range of wavelengths, you can add "wl = [1545e-9, 1548e-9]" in the first cell after "Total power impulse response". P.s. In case of any questions or suggestions you are welcome to write me an email ekader@chalmers.se
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).
Facebook
TwitterBy data.world's Admin [source]
The data was obtained from multiple sources. Data from 1985-2002 were downloaded from the National Bureau for Economic Research through the National Center for Health Statistics' National Vital Statistics System. Data from 2003-2015 were sourced using aggregators provided by CDC's WONDER tool, utilizing Year, Month, State, and County filters. It is worth noting that geolocation information for individual babies born after 2005 is not released due to privacy concerns; therefore, all data has been aggregated by month.
The spatial applicability of this dataset is limited to the United States at the county level. It covers a temporal range spanning January 1, 1985 - December 31, 2015. Each row in the dataset represents aggregated birth counts within a specific county for a particular month and year.
Additional notes highlight that this dataset expands on data presented in an essay called The Timing of Baby Making published by The Pudding website in May 2017. While only data ranging from1995-2015 were displayed in the essay itself, this dataset includes an extra ten years of birth data. Furthermore, any non-US residents have been excluded from this dataset.
The provided metadata gives a detailed breakdown of the columns in the dataset, including their descriptions and data types. The included variables allow researchers to analyze births at both individual county and state levels over time. Finally, the dataset is available under the MIT License for public use
Here is a guide on how to effectively use this dataset:
Step 1: Understanding the Columns
The dataset consists of several columns that provide specific information about each birth record. Let's understand what each column represents:
- State: The state (including District of Columbia) where the mother lives.
- County: The county where the mother lives, coded using the FIPS County Code.
- Month: The month in which the birth took place (1 = January, 2 = February, etc.).
- Year: The four-digit year of the birth.
- countyBirths: The calculated sum of births that occurred to mothers living in a county for a given month. If the sum was less than 9, it is listed as NA as per NCHS reporting guidelines.
- stateBirths: The calculated sum of births that occurred to mothers living in a state for a given month. It includes all birth counts, even those from counties with fewer than 9 births.
Step 2: Exploring Birth Trends by State and County
You can analyze birth trends by focusing on specific states or counties within specific time frames. Here's how you can do it:
Filter by State or County:
- Select rows based on your chosen state using the State column. Each number corresponds to a specific state (e.g.,
01= Alabama).- Further narrow down your analysis by selecting specific counties using their respective FIPS codes mentioned in the County column.
Analyze Monthly Variation:
- Calculate monthly total births within your desired location(s) by grouping data based on the Month column.
- Compare the number of births between different months to identify any seasonal trends or patterns.
Visualize Birth Trends:
- Create line charts or bar plots to visualize how the number of births changes over time.
- Plot a line or bar for each month across multiple years to identify any significant changes in birth rates.
Step 3: Comparison and Calculation
You can utilize this dataset to compare birth rates between states, counties, and regions. Here are a few techniques you can try:
- State vs. County Comparison:
- Calculate the total births within each state by aggregating
- Analyzing birth trends: This dataset can be used to analyze and understand the trends in birth rates across different states and counties over the period of 1985 to 2015. Researchers can study factors that may influence these trends, such as socioeconomic factors, healthcare access, or cultural changes.
- Identifying seasonal variations: The dataset includes information on the month of birth for each entry. This data can be utilized to identify any seasonal variations in births across different locations in the US. Understanding these variations can help in planning resources and healthcare services accordingly.
- Studying geographical patterns: By analyzing the county-level data, researchers can explore geographical patterns of childbirth throughout the United States. They can identify regions with high or low birth rates and...
Facebook
TwitterBy Andy Kriebel [source]
This dataset provides a comprehensive look at the changing trends in marriage and divorce over the years in the United States. It includes data on gender, age range, and year for those who have never been married – examining who is deciding to forgo tying the knot in today’s society. Diving into this data may offer insight into how life-changing decisions are being made as customs shift along with our times. This could be especially interesting when examined by generation or other trends within our population. Are young adults embracing or avoiding marriage? Has divorce become more or less common within certain social groups? Can recent economic challenges be related to changes in marital status trends? Take a look at this dataset and let us know what stories you find!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains surveys which explore the number of never married people in the United States, separated by gender, age range and year. You can use this dataset to analyze the trends in never married people throughout the years and see how it is affected by different demographics.
To make the most out of this dataset you could start by exploring the changes on different ages ranges and genders. Plotting how they differ along time might unveil interesting patterns that can help you uncover why certain groups are more or less likely to remain single throughout time. Understanding these trends could also help people looking for a life-partner better understand their own context as compared to others around them enabling them to make informed decisions about when is a good time for them to find someone special.
In addition, this dataset can be used to examine what acts as an enabler or deterrent for staying single within different couples of age ranges and genders across states. Does marriage look more attractive in any particular state? Are there differences between genders? Knowing all these factors can inform us about economic or social insights within society as well as overall lifestyle choices that tend towards being single or married during one's life cycle in different regions around United States of America.
Finally, with this information policymakers can construct efficient policies that better fit our country's priorities by providing programs designed based on specific characteristics within each group helping ensure they match preferable relationships while having access concentrated resources such actions already taken towards promoting wellbeing our citizens regarding relationships like marriage counseling services or family support centers!
- Examine the differences in trends of ever-married vs never married people across different age ranges and genders.
- Explore the correlation between life decision changes and economic conditions for ever-married and never married people over time.
- Analyze how marriage trends differ based on region, socio-economic status, or religious beliefs to understand how these influence decisions about marriage
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Never Married.csv | Column name | Description | |:------------------|:--------------------------------------------------------| | Gender | Gender of the individual. (String) | | Age Range | Age range of the individual. (String) | | Year | Year of the data. (Integer) | | Never Married | Number of people who have never been married. (Integer) |
If you use this dataset in your research, please ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The elevation range measures the full range of elevations within a circular window and can be used as a representation of local relief.
The 300 m elevation range product was derived from the Smoothed Digital Elevation Model (DEM-S; ANZCW0703014016), which was derived from the 1 arc-second resolution SRTM data acquired by NASA in February 2000.
This collection includes data at 1 arc-second and 3 arc-second resolutions.
The 3 arc-second resolution product was generated from the 1 arc-second 300 m elevation range product and masked by the 3” water and ocean mask datasets. Lineage: Source data 1.\t1 arc-second SRTM-derived Smoothed Digital Elevation Model (DEM-S; ANZCW0703014016). 2.\t1 arc-second 300 m elevation range product 3.\t3 arc-second resolution SRTM water body and ocean mask datasets
300 m focal range elevation calculation Elevation range is the full range of elevation within a circular window (Gallant and Wilson, 2000). Focal range using a 300 m window was calculated for each grid point from DEM-S using a 300 m kernel. The different spacing in the E-W and N-S directions due to the geographic projection of the data was accounted for by using the actual spacing in metres of the grid points, and recalculating the grid points included within the kernel extent for each 1o change in latitude.
The 300 m focal range elevation calculation was performed on 1° x 1° tiles, with overlaps to ensure correct values at tile edges.
The 3 arc-second resolution version was generated from the 1 second 300 m elevation range product. This was done by aggregating the 1” data over a 3 x 3 grid cell window and taking the maximum of the nine values that contributed to each 3” output grid cell. The 3” 300 m elevation range data were then masked using the SRTM 3” ocean and water body datasets.
References Gallant, J.C. and Wilson, J.P. (2000) Primary topographic attributes, chapter 3 in Wilson, J.P. and Gallant, J.C. Terrain Analysis: Principles and Applications, John Wiley and Sons, New York.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The elevation range measures the full range of elevations within a circular window and can be used as a representation of local relief.
The 1000 m elevation range product was derived from the Smoothed Digital Elevation Model (DEM-S; ANZCW0703014016), which was derived from the 1 arc-second resolution SRTM data acquired by NASA in February 2000.
This collection includes Relief data at 1 arc-second and 3 arc-second resolutions.
The 3 arc-second resolution product was generated from the 1 arc-second 1000 m elevation range product and masked by the 3” water and ocean mask datasets. Lineage: Source data 1. 1 arc-second SRTM-derived Smoothed Digital Elevation Model (DEM-S; ANZCW0703014016). 2. 1 arc-second 1000 m elevation range product 3. 3 arc-second resolution SRTM water body and ocean mask datasets
1000 m focal range elevation calculation Elevation range is the full range of elevation within a circular window (Gallant and Wilson, 2000). Focal range using a 1000 m window was calculated for each grid point from DEM-S using a 1000 m kernel. The different spacing in the E-W and N-S directions due to the geographic projection of the data was accounted for by using the actual spacing in metres of the grid points, and recalculating the grid points included within the kernel extent for each 1o change in latitude.
The 1000 m focal range elevation calculation was performed on 1° x 1° tiles, with overlaps to ensure correct values at tile edges.
The 3 arc-second resolution version was generated from the 1 arc-second 1000 m elevation range product. This was done by aggregating the 1” data over a 3 x 3 grid cell window and taking the maximum of the nine values that contributed to each 3” output grid cell. The 3” 1000 m elevation range data were then masked using the SRTM 3” ocean and water body datasets.
References Gallant, J.C. and Wilson, J.P. (2000) Primary topographic attributes, chapter 3 in Wilson, J.P. and Gallant, J.C. Terrain Analysis: Principles and Applications, John Wiley and Sons, New York.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
To analyze the salaries of company employees using Pandas, NumPy, and other tools, you can structure the analysis process into several steps:
Case Study: Employee Salary Analysis In this case study, we aim to analyze the salaries of employees across different departments and levels within a company. Our goal is to uncover key patterns, identify outliers, and provide insights that can support decisions related to compensation and workforce management.
Step 1: Data Collection and Preparation Data Sources: The dataset typically includes employee ID, name, department, position, years of experience, salary, and additional compensation (bonuses, stock options, etc.). Data Cleaning: We use Pandas to handle missing or incomplete data, remove duplicates, and standardize formats. Example: df.dropna() to handle missing salary information, and df.drop_duplicates() to eliminate duplicate entries. Step 2: Data Exploration and Descriptive Statistics Exploratory Data Analysis (EDA): Using Pandas to calculate basic statistics such as mean, median, mode, and standard deviation for employee salaries. Example: df['salary'].describe() provides an overview of the distribution of salaries. Data Visualization: Leveraging tools like Matplotlib or Seaborn for visualizing salary distributions, box plots to detect outliers, and bar charts for department-wise salary breakdowns. Example: sns.boxplot(x='department', y='salary', data=df) provides a visual representation of salary variations by department. Step 3: Analysis Using NumPy Calculating Salary Ranges: NumPy can be used to calculate the range, variance, and percentiles of salary data to identify the spread and skewness of the salary distribution. Example: np.percentile(df['salary'], [25, 50, 75]) helps identify salary quartiles. Correlation Analysis: Identify the relationship between variables such as experience and salary using NumPy to compute correlation coefficients. Example: np.corrcoef(df['years_of_experience'], df['salary']) reveals if experience is a significant factor in salary determination. Step 4: Grouping and Aggregation Salary by Department and Position: Using Pandas' groupby function, we can summarize salary information for different departments and job titles to identify trends or inequalities. Example: df.groupby('department')['salary'].mean() calculates the average salary per department. Step 5: Salary Forecasting (Optional) Predictive Analysis: Using tools such as Scikit-learn, we could build a regression model to predict future salary increases based on factors like experience, education level, and performance ratings. Step 6: Insights and Recommendations Outlier Identification: Detect any employees earning significantly more or less than the average, which could signal inequities or high performers. Salary Discrepancies: Highlight any salary discrepancies between departments or gender that may require further investigation. Compensation Planning: Based on the analysis, suggest potential changes to the salary structure or bonus allocations to ensure fair compensation across the organization. Tools Used: Pandas: For data manipulation, grouping, and descriptive analysis. NumPy: For numerical operations such as percentiles and correlations. Matplotlib/Seaborn: For data visualization to highlight key patterns and trends. Scikit-learn (Optional): For building predictive models if salary forecasting is included in the analysis. This approach ensures a comprehensive analysis of employee salaries, providing actionable insights for human resource planning and compensation strategy.
Facebook
Twitter
The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.
All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.
The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.
The number of measurement points were distributed across participants as follows:
Participant 1 – 3 baseline, 6 treatment, 1 post-treatment
Participant 3 – 2 baseline, 7 treatment, 1 post-treatment
Participant 5 – 2 baseline, 5 treatment, 1 post-treatment
Participant 6 – 3 baseline, 4 treatment, 1 post-treatment
Participant 7 – 2 baseline, 5 treatment, 1 post-treatment
In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.
Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.
Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.
Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.
Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.
During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.
Age = age in ‘years, months’ at the start of the study
Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)
Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).
The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.
The second part of the variable name refers to the task, as follows:
DC = dot comparison
SDC = single-digit computation
NLE_UT = number line estimation (untrained set)
NLE_T= number line estimation (trained set)
CE = multidigit computational estimation
NC = number comparison
The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).
Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.