95 datasets found

Mathematics Dataset
github.com
opendatalab.com
+1more
Updated Apr 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
Explore at:
Dataset updated
Apr 3, 2019
Dataset provided by
DeepMindhttp://deepmind.com/
Description
This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r. Answer: 4 Question: Calculate -841880142.544 + 411127. Answer: -841469015.544 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)). Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

algebra (linear equations, polynomial roots, sequences)

arithmetic (pairwise operations and mixed expressions, surds)

calculus (differentiation)

comparison (closest numbers, pairwise comparisons, sorting)

measurement (conversion, working with time)

numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)

polynomials (addition, simplification, composition, evaluating, expansion)

probability (sampling without replacement)
GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 -...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/glas-icesat-l1b-global-waveform-based-range-corrections-data-hdf5-v034
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
GLAH05 Level-1B waveform parameterization data include output parameters from the waveform characterization procedure and other parameters required to calculate surface slope and relief characteristics. GLAH05 contains parameterizations of both the transmitted and received pulses and other characteristics from which elevation and footprint-scale roughness and slope are calculated. The received pulse characterization uses two implementations of the retracking algorithms: one tuned for ice sheets, called the standard parameterization, used to calculate surface elevation for ice sheets, oceans, and sea ice; and another for land (the alternative parameterization). Each data granule has an associated browse product.
housing
kaggle.com
zip
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HappyRautela (2023). housing [Dataset]. https://www.kaggle.com/datasets/happyrautela/housing
Explore at:
zip(809785 bytes)Available download formats
Dataset updated
Sep 22, 2023
Authors
HappyRautela
Description
The exercise after this contains questions that are based on the housing dataset.

How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173

How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161

How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92

What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000

For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.

What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features

If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.

If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above

If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
Salaries case study
kaggle.com
zip
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shobhit Chauhan (2024). Salaries case study [Dataset]. https://www.kaggle.com/datasets/satyam0123/salaries-case-study
Explore at:
zip(13105509 bytes)Available download formats
Dataset updated
Oct 2, 2024
Authors
Shobhit Chauhan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
To analyze the salaries of company employees using Pandas, NumPy, and other tools, you can structure the analysis process into several steps:

Case Study: Employee Salary Analysis In this case study, we aim to analyze the salaries of employees across different departments and levels within a company. Our goal is to uncover key patterns, identify outliers, and provide insights that can support decisions related to compensation and workforce management.

Step 1: Data Collection and Preparation Data Sources: The dataset typically includes employee ID, name, department, position, years of experience, salary, and additional compensation (bonuses, stock options, etc.). Data Cleaning: We use Pandas to handle missing or incomplete data, remove duplicates, and standardize formats. Example: df.dropna() to handle missing salary information, and df.drop_duplicates() to eliminate duplicate entries. Step 2: Data Exploration and Descriptive Statistics Exploratory Data Analysis (EDA): Using Pandas to calculate basic statistics such as mean, median, mode, and standard deviation for employee salaries. Example: df['salary'].describe() provides an overview of the distribution of salaries. Data Visualization: Leveraging tools like Matplotlib or Seaborn for visualizing salary distributions, box plots to detect outliers, and bar charts for department-wise salary breakdowns. Example: sns.boxplot(x='department', y='salary', data=df) provides a visual representation of salary variations by department. Step 3: Analysis Using NumPy Calculating Salary Ranges: NumPy can be used to calculate the range, variance, and percentiles of salary data to identify the spread and skewness of the salary distribution. Example: np.percentile(df['salary'], [25, 50, 75]) helps identify salary quartiles. Correlation Analysis: Identify the relationship between variables such as experience and salary using NumPy to compute correlation coefficients. Example: np.corrcoef(df['years_of_experience'], df['salary']) reveals if experience is a significant factor in salary determination. Step 4: Grouping and Aggregation Salary by Department and Position: Using Pandas' groupby function, we can summarize salary information for different departments and job titles to identify trends or inequalities. Example: df.groupby('department')['salary'].mean() calculates the average salary per department. Step 5: Salary Forecasting (Optional) Predictive Analysis: Using tools such as Scikit-learn, we could build a regression model to predict future salary increases based on factors like experience, education level, and performance ratings. Step 6: Insights and Recommendations Outlier Identification: Detect any employees earning significantly more or less than the average, which could signal inequities or high performers. Salary Discrepancies: Highlight any salary discrepancies between departments or gender that may require further investigation. Compensation Planning: Based on the analysis, suggest potential changes to the salary structure or bonus allocations to ensure fair compensation across the organization. Tools Used: Pandas: For data manipulation, grouping, and descriptive analysis. NumPy: For numerical operations such as percentiles and correlations. Matplotlib/Seaborn: For data visualization to highlight key patterns and trends. Scikit-learn (Optional): For building predictive models if salary forecasting is included in the analysis. This approach ensures a comprehensive analysis of employee salaries, providing actionable insights for human resource planning and compensation strategy.
Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
A
NIST Stopping-Power & Range Tables for Electrons, Protons, and Helium Ions -...
data.amerigeoss.org
catalog.data.gov
+1more
html
Updated Jul 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2019). NIST Stopping-Power & Range Tables for Electrons, Protons, and Helium Ions - SRD 124 [Dataset]. https://data.amerigeoss.org/de/dataset/nist-stopping-power-and-range-tables-for-electrons-protons-and-helium-ions-srd-124
Explore at:
htmlAvailable download formats
Dataset updated
Jul 27, 2019
Dataset provided by
United States
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
The databases ESTAR, PSTAR, and ASTAR calculate stopping-power and range tables for electrons, protons, or helium ions. Stopping-power and range tables can be calculated for electrons in any user-specified material and for protons and helium ions in 74 materials.
r
Data from: Analysis of the Scalar and Vector Random Coupling Models For a...
researchdata.se
demo.researchdata.se
+2more
Updated Dec 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ekaterina Deriushkina (2023). Analysis of the Scalar and Vector Random Coupling Models For a Four Coupled-Core Fiber [Dataset]. http://doi.org/10.5281/zenodo.7895952
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7895952
Dataset updated
Dec 9, 2023
Dataset provided by
Chalmers University of Technology
Authors
Ekaterina Deriushkina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The files with simulation results for ECOC 20223 submission "Analysis of the Scalar and Vector Random Coupling Models For a Four Coupled-Core Fiber". "4CCF_eigenvectorsPol" file is the Mathematica code which enables to calculate supermodes (eigenvectors of M(w)) and their propagation constants of 4-coupled-core fiber (4CCF). These results are uploaded to the python notebook "4CCF_modelingECOC" in order to plot them to get Fig. 2 in the paper. "TransferMatrix" is the python file with functions used for modeling, simulation and plotting. It is also uploaded in the python notebook "4CCF_modelingECOC", where all the calculations for figures in the paper are presented.

! UPD 25.09.2023: There is an error in the formula of birefringence calculation. It is in the function "CouplingCoefficients" in "TransferMatrix" file. There the variable "birefringence" has to be calculated according to the formula (19) [A. Ankiewicz, A. Snyder, and X.-H. Zheng, "Coupling between parallel optical fiber cores–critical examination", Journal of Lightwave Technology, vol. 4, no. 9,pp. 1317–1323, 1986]: (4*U**2*W*spec.k0(W)*spec.kn(2, W_)/(spec.k1(W)*V**4))*((spec.iv(1, W)/spec.k1(W))-(spec.iv(2, W)/spec.k0(W))) The correct formula gives almost the same result (the difference is 10^-5), but one has to use a correct formula anyway. ! UPD 9.12.2023: I have noticed that in the published version of the code I forgot to change the wavelength range for impulse response calculation. So instead of seeing the nice shape as in the paper you will see resolution limited shape. To solve that just change the range of wavelengths, you can add "wl = [1545e-9, 1548e-9]" in the first cell after "Total power impulse response". P.s. In case of any questions or suggestions you are welcome to write me an email ekader@chalmers.se
d
Data from: Haploids adapt faster than diploids across a range of...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Dec 7, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleeza C Gerstein; Lesley A Cleathero; Mohammad A Mandegar; Sarah P. Otto (2010). Haploids adapt faster than diploids across a range of environments [Dataset]. http://doi.org/10.5061/dryad.8048
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8048
Dataset updated
Dec 7, 2010
Dataset provided by
Dryad
Authors
Aleeza C Gerstein; Lesley A Cleathero; Mohammad A Mandegar; Sarah P. Otto
Time period covered
Dec 7, 2010
Description
Raw data to calculate rate of adaptationRaw dataset for rate of adaptation calculations (Figure 1) and related statistics.dataall.csvR code to analyze raw data for rate of adaptationCompetition Analysis.RRaw data to calculate effective population sizesdatacount.csvR code to analayze effective population sizesR code used to analyze effective population sizes; Figure 2Cell Count Ne.RR code to determine our best estimate of the dominance coefficient in each environmentR code to produce figures 3, S4, S5 -- what is the best estimate of dominance? Note, competition and effective population size R code must be run first in the same session.what is h.R
S
Python numerical computation code for the article of "Numerical study of...
scidb.cn
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lu Kun (2025). Python numerical computation code for the article of "Numerical study of superradiance and Hawking radiation of rotating acoustic black holes" [Dataset]. http://doi.org/10.57760/sciencedb.24506
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.24506
Dataset updated
Jun 6, 2025
Dataset provided by
Science Data Bank
Authors
Lu Kun
License
https://api.github.com/licenses/mithttps://api.github.com/licenses/mit
Description
This dataset contains Python numerical computation code for studying the phenomena of acoustic superluminescence and Hawking radiation in specific rotating acoustic black hole models. The code is based on the radial wave equation of scalar field (acoustic disturbance) under the effective acoustic metric background derived from analysis. Dataset generation process and processing methods: The core code is written in Python language, using standard scientific computing libraries NumPy and SciPy. The main steps include: (1) defining model parameters (such as A, B, m) and calculation range (frequency $\ omega $from 0.01 to 2.0, turtle coordinates $r ^ * $from -20 to 20); (2) Implement the mutual conversion function between the radial coordinate $r $and the turtle coordinate $r ^ * $, where the inversion of $r ^ * (r) $is numerically solved using SciPy's' optimize.root_scalar 'function (such as Brent's method), and special attention is paid to calculations near the horizon $r_H=| A |/c $to ensure stability; (3) Calculate the effective potential $V_0 (r ^ *, \ omega) $that depends on $r (r ^ *) $; (4) Convert the second-order radial wave equation into a system of quaternion first-order real valued ordinary differential equations; (5) The ODE system was solved using SciPy's' integrate. solve_ivp 'function (using an adaptive step size RK45 method with relative and absolute error margins set to $10 ^ {-8} $), applying pure inward boundary conditions (normalized unit transmission) at the field of view and asymptotic behavior at infinity; (6) Extract the reflection coefficient $\ mathcal {R} $and transmission coefficient $\ mathcal {T} $from the numerical solution; (7) Calculate the Hawking radiation power spectrum $P_ \ omega $based on the derived Hawking temperature $TH $, event horizon angular velocity $\ Omega-H $, Bose Einstein statistics, and combined with the gray body factor $| \ mathcal {T} | ^ 2 $. The calculation process adopts the natural unit system ($\ hbar=k_B=c=1 $) and sets the feature length $r_0=1 $. Dataset content: This dataset mainly includes a Python script file (code for numerical research on superluminescence and Hawking radiation of rotating acoustic black holes. py) and a README documentation file (README. md). The Python script implements the complete calculation process mentioned above. The README file provides a detailed explanation of the code's functionality, the required dependency libraries (Python 3, NumPy, SciPy) for running, the running methods, and the meaning of parameters. This dataset does not contain any raw experimental data and is only theoretical calculation code. Data accuracy and validation: The reliability of the code has been validated through two key indicators: (1) Flow conservation relationship$|\ mathcal{R}|^2 + [(\omega-m\Omega_H)/\omega]|\mathcal{T}|^2 = 1$ The numerical approximation holds within the calculated frequency range (with a deviation typically on the order of $10 ^ {-8} $or less); (2) Under the condition of superluminescence $0<\ omega1 $, which is consistent with theoretical expectations. File format and software: The code is in standard Python 3 (. py) format and can run in any standard Python 3 environment with NumPy and SciPy libraries installed. The README file is in Markdown (. md) format and can be opened with any text editor or Markdown viewer. No special or niche software is required.
Dataset for the paper "Observation of Acceleration and Deceleration Periods...
zenodo.org
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yide Qian; Yide Qian (2025). Dataset for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 " [Dataset]. http://doi.org/10.5281/zenodo.15022854
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15022854
Dataset updated
Mar 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yide Qian; Yide Qian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pine Island Glacier
Description
Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "

Description of the data and file structure

The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".

Files and variables

File 1: Data_and_Code.zip

Directory: Main_function

**Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.

MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.

Script_1_Ice_velocity_process_flow.m

Script_2_strain_rate_process_flow.m

Script_3_DROT_grounding_line_extraction.m

Script_4_Read_ICESat2_h5_files.m

Script_5_Extraction_results.m

MATLAB functions: Five Files that includes MATLAB functions that support the main script:

1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.

2_strain_rate: Include MATLAB functions related to strain rate calculation.

3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.

4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.

5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).

6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.

Director: data_and_result

Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.

1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.

2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).

3_rockpoint: Extract velocities at non-moving region

4_constant_detrend: removed orbit error

5_Tidal_correction: remove atmospheric and tidal induced error

6_rockpoint: Extract non-aggregated velocities at non-moving region

6_vx_vy_v: trasform velocities from va/vr to vx/vy

7_rockpoint: Extract aggregated velocities at non-moving region

7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps

8_strain_rate: calculated strain rate from aggregate ice velocity

9_compare: store the results before and after tidal correction and aggregation.

10_Block_result: times series results that extrac from 2D data.

11_MALAB_output_png_result: Store .png files and time serties result

12_DROT: Differential Range Offset Tracking results

13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)

14_MODIS_images: you can store MODIS images here

shp: grounding line, rock region, ice front, and other shape files.

File 2 : PIG_front_1947_2023.zip

Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

File 3 : PIG_DROT_GL_2016_2021.zip

Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

Data was derived from the following sources:
Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.
o
Monthly time series of spatially enhanced relative humidity for Europe at...
data.opendatascience.eu
data.mundialis.de
+1more
Updated Dec 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Monthly time series of spatially enhanced relative humidity for Europe at 1000 m resolution (2000 - 2022) derived from ERA5-Land data [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?keyword=TBE
Explore at:
Dataset updated
Dec 16, 2023
Area covered
Europe
Description
Overview: ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land has been produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. Reanalysis produces data that goes several decades back in time, providing an accurate description of the climate of the past. Processing steps: The original hourly ERA5-Land air temperature 2 m above ground and dewpoint temperature 2 m data has been spatially enhanced from 0.1 degree to 30 arc seconds (approx. 1000 m) spatial resolution by image fusion with CHELSA data (V1.2) (https://chelsa-climate.org/). For each day we used the corresponding monthly long-term average of CHELSA. The aim was to use the fine spatial detail of CHELSA and at the same time preserve the general regional pattern and fine temporal detail of ERA5-Land. The steps included aggregation and enhancement, specifically: 1. spatially aggregate CHELSA to the resolution of ERA5-Land 2. calculate difference of ERA5-Land - aggregated CHELSA 3. interpolate differences with a Gaussian filter to 30 arc seconds. 4. add the interpolated differences to CHELSA Subsequently, the temperature time series have been aggregated on a daily basis. From these, daily relative humidity has been calculated for the time period 01/2000 - 12/2023. Relative humidity (rh2m) has been calculated from air temperature 2 m above ground (Ta) and dewpoint temperature 2 m above ground (Td) using the formula for saturated water pressure from Wright (1997): maximum water pressure = 611.21 * exp(17.502 * Ta / (240.97 + Ta)) actual water pressure = 611.21 * exp(17.502 * Td / (240.97 + Td)) relative humidity = actual water pressure / maximum water pressure The resulting relative humidity has been aggregated to monthly averages. Resultant values have been converted to represent percent * 10, thus covering a theoretical range of [0, 1000]. The data have been reprojected to EU LAEA. File naming scheme (YYYY = year; MM = month): ERA5_land_rh2m_avg_monthly_YYYY_MM.tif Projection + EPSG code: EU LAEA (EPSG: 3035) Spatial extent: north: 6874000 south: -485000 west: 869000 east: 8712000 Spatial resolution: 1000 m Temporal resolution: Monthly Pixel values: Percent * 10 (scaled to Integer; example: value 738 = 73.8 %) Software used: GDAL 3.2.2 and GRASS GIS 8.0.0/8.3.2 Original ERA5-Land dataset license: https://apps.ecmwf.int/datasets/licences/copernicus/ CHELSA climatologies (V1.2): Data used: Karger D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E, Linder, H.P., Kessler, M. (2018): Data from: Climatologies at high resolution for the earth's land surface areas. Dryad digital repository. http://dx.doi.org/doi:10.5061/dryad.kd1d4 Original peer-reviewed publication: Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, P., Kessler, M. (2017): Climatologies at high resolution for the Earth land surface areas. Scientific Data. 4 170122. https://doi.org/10.1038/sdata.2017.122 Processed by: mundialis GmbH & Co. KG, Germany (https://www.mundialis.de/) Reference: Wright, J.M. (1997): Federal meteorological handbook no. 3 (FCM-H3-1997). Office of Federal Coordinator for Meteorological Services and Supporting Research. Washington, DC Data is also available in Latitude-Longitude/WGS84 (EPSG: 4326) projection: https://data.mundialis.de/geonetwork/srv/eng/catalog.search#/metadata/b9ce7dba-4130-428d-96f0-9089d8b9f4a5 Acknowledgements: This study was partially funded by EU grant 874850 MOOD. The contents of this publication are the sole responsibility of the authors and don't necessarily reflect the views of the European Commission.
Google Analytics data of an E-commerce Company
kaggle.com
zip
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fehu.zone (2024). Google Analytics data of an E-commerce Company [Dataset]. https://www.kaggle.com/datasets/fehu94/google-analytics-data-of-an-e-commerce-company
Explore at:
zip(3156 bytes)Available download formats
Dataset updated
Oct 19, 2024
Authors
fehu.zone
Description
📊 Dataset Title: Daily Active Users Dataset

📝 Description

This dataset provides detailed insights into daily active users (DAU) of a platform or service, captured over a defined period of time. The dataset includes information such as the number of active users per day, allowing data analysts and business intelligence teams to track usage trends, monitor platform engagement, and identify patterns in user activity over time.

The data is ideal for performing time series analysis, statistical analysis, and trend forecasting. You can utilize this dataset to measure the success of platform initiatives, evaluate user behavior, or predict future trends in engagement. It is also suitable for training machine learning models that focus on user activity prediction or anomaly detection.

📂 Dataset Structure

The dataset is structured in a simple and easy-to-use format, containing the following columns:

Date: The date on which the data was recorded, formatted as YYYYMMDD.

Number of Active Users: The number of users who were active on the platform on the corresponding date.

Each row in the dataset represents a unique date and its corresponding number of active users. This allows for time-based analysis, such as calculating the moving average of active users, detecting seasonality, or spotting sudden spikes or drops in engagement.

🧐 Key Use Cases

This dataset can be used for a wide range of purposes, including:

Time Series Analysis: Analyze trends and seasonality of user engagement.

Trend Detection: Discover peaks and valleys in user activity.

Anomaly Detection: Use statistical methods or machine learning algorithms to detect anomalies in user behavior.

Forecasting User Growth: Build forecasting models to predict future platform usage.

Seasonality Insights: Identify patterns like increased activity on weekends or holidays.

📈 Potential Analysis

Here are some specific analyses you can perform using this dataset:

Moving Average and Smoothing: Calculate the moving average over a 7-day or 30-day period.

Correlation with External Factors: Correlate daily active users with other datasets.

Statistical Hypothesis Testing: Perform t-tests or ANOVA to determine significant differences in user activity.

Machine Learning for Prediction: Train machine learning models to predict user engagement.

🚀 Getting Started

To get started with this dataset, you can load it into your preferred analysis tool. Here's how to do it using Python's pandas library:

import pandas as pd # Load the dataset data = pd.read_csv('path_to_dataset.csv') # Display the first few rows print(data.head()) # Basic statistics print(data.describe())
c
Data from: U.S. Geological Survey calculated half interpercentile range...
s.cnmilf.com
search.dataone.org
+1more
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). U.S. Geological Survey calculated half interpercentile range (half of the difference between the 16th and 84th percentiles) of wave-current bottom shear stress in the South Atlantic Bight from May 2010 to May 2011 (SAB_hIPR.shp, polygon shapefile, Geographic, WGS84) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/u-s-geological-survey-calculated-half-interpercentile-range-half-of-the-difference-between
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The U.S. Geological Survey has been characterizing the regional variation in shear stress on the sea floor and sediment mobility through statistical descriptors. The purpose of this project is to identify patterns in stress in order to inform habitat delineation or decisions for anthropogenic use of the continental shelf. The statistical characterization spans the continental shelf from the coast to approximately 120 m water depth, at approximately 5 km resolution. Time-series of wave and circulation are created using numerical models, and near-bottom output of steady and oscillatory velocities and an estimate of bottom roughness are used to calculate a time-series of bottom shear stress at 1-hour intervals. Statistical descriptions such as the median and 95th percentile, which are the output included with this database, are then calculated to create a two-dimensional picture of the regional patterns in shear stress. In addition, time-series of stress are compared to critical stress values at select points calculated from observed surface sediment texture data to determine estimates of sea floor mobility.
c
Data from: Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 -...
s.cnmilf.com
data.usgs.gov
+2more
Updated Oct 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 - 7—Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/variable-terrestrial-gps-telemetry-detection-rates-parts-1-7data
Explore at:
Dataset updated
Oct 2, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Studies utilizing Global Positioning System (GPS) telemetry rarely result in 100% fix success rates (FSR). Many assessments of wildlife resource use do not account for missing data, either assuming data loss is random or because a lack of practical treatment for systematic data loss. Several studies have explored how the environment, technological features, and animal behavior influence rates of missing data in GPS telemetry, but previous spatially explicit models developed to correct for sampling bias have been specified to small study areas, on a small range of data loss, or to be species-specific, limiting their general utility. Here we explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use. We also evaluate patterns in missing data that relate to potential animal activities that change the orientation of the antennae and characterize home-range probability of GPS detection for 4 focal species; cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Part 1, Positive Openness Raster (raster dataset): Openness is an angular measure of the relationship between surface relief and horizontal distance. For angles less than 90 degrees it is equivalent to the internal angle of a cone with its apex at a DEM _location, and is constrained by neighboring elevations within a specified radial distance. 480 meter search radius was used for this calculation of positive openness. Openness incorporates the terrain line-of-sight or viewshed concept and is calculated from multiple zenith and nadir angles-here along eight azimuths. Positive openness measures openness above the surface, with high values for convex forms and low values for concave forms (Yokoyama et al. 2002). We calculated positive openness using a custom python script, following the methods of Yokoyama et. al (2002) using a USGS National Elevation Dataset as input. Part 2, Northern Arizona GPS Test Collar (csv): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. The model training data are provided here for fix attempts by hour. This table can be linked with the site _location shapefile using the site field. Part 3, Probability Raster (raster dataset): Bias correction in GPS telemetry datasets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix aquistion. We found terrain exposure and tall overstory vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The models predictive ability was evaluated using two independent datasets from stationary test collars of different make/model, fix interval programing, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. We evaluated GPS telemetry datasets by comparing the mean probability of a successful GPS fix across study animals home-ranges, to the actual observed FSR of GPS downloaded deployed collars on cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Comparing the mean probability of acquisition within study animals home-ranges and observed FSRs of GPS downloaded collars resulted in a approximatly 1:1 linear relationship with an r-sq= 0.68. Part 4, GPS Test Collar Sites (shapefile): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. Part 5, Cougar Home Ranges (shapefile): Cougar home-ranges were calculated to compare the mean probability of a GPS fix acquisition across the home-range to the actual fix success rate (FSR) of the collar as a means for evaluating if characteristics of an animal’s home-range have an effect on observed FSR. We estimated home-ranges using the Local Convex Hull (LoCoH) method using the 90th isopleth. Data obtained from GPS download of retrieved units were only used. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose as additional 10% of data. Comparisons with home-range mean probability of fix were also used as a reference for assessing if the frequency animals use areas of low GPS acquisition rates may play a role in observed FSRs. Part 6, Cougar Fix Success Rate by Hour (csv): Cougar GPS collar fix success varied by hour-of-day suggesting circadian rhythms with bouts of rest during daylight hours may change the orientation of the GPS receiver affecting the ability to acquire fixes. Raw data of overall fix success rates (FSR) and FSR by hour were used to predict relative reductions in FSR. Data only includes direct GPS download datasets. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose approximately an additional 10% of data. Part 7, Openness Python Script version 2.0: This python script was used to calculate positive openness using a 30 meter digital elevation model for a large geographic area in Arizona, California, Nevada and Utah. A scientific research project used the script to explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use.
n
Data from: Contrasting effects of host or local specialization: widespread...
data-staging.niaid.nih.gov
ourarchive.otago.ac.nz
+3more
zip
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin (2024). Contrasting effects of host or local specialization: widespread haemosporidians are host generalist whereas local specialists are locally abundant [Dataset]. http://doi.org/10.5061/dryad.j3tx95xfb
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j3tx95xfb
Dataset updated
Mar 13, 2024
Dataset provided by
University of Otago
Universidade Estadual de Campinas (UNICAMP)
Authors
Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).
Annual Average Temperature Change - Projections (12km)
climatedataportal.metoffice.gov.uk
hub.arcgis.com
+1more
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Met Office (2023). Annual Average Temperature Change - Projections (12km) [Dataset]. https://climatedataportal.metoffice.gov.uk/datasets/TheMetOffice::annual-average-temperature-change-projections-12km/explore?showTable=true
Explore at:
Dataset updated
Jun 1, 2023
Dataset authored and provided by
Met Officehttp://www.metoffice.gov.uk/
Area covered

Description
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.13°C.]What does the data show? This dataset shows the change in annual temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Note, as the values in this dataset are averaged over a year they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare annual average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.

PeriodDescription 1981-2000 baselineAverage temperature (°C) for the period 2001-2020 (recent past)Average temperature (°C) for the period 2001-2020 (recent past) changeTemperature change (°C) relative to 1981-2000 1.5°C global warming level changeTemperature change (°C) relative to 1981-2000 2°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-2000 3°C global warming level changeTemperature change (°C) relative to 1981-2000 4°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Annual Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for the 1981-2000 baseline, 2001-2020 period and each warming level. They are named 'tas annual change' (change in air 'temperature at surface'), the warming level or historic time period, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas annual change 2.0 median' is the median value for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas annual change 2.0 median' is named 'tas_annual_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas annual change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
f
Summary and methods used to calculate the physical characteristics used to...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathan, Senthilvel K. S. S.; Saldivar, Diana A. Ramirez; Vaughan, Ian P.; Goossens, Benoit; Stark, Danica J. (2017). Summary and methods used to calculate the physical characteristics used to compare the home range estimators. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001743878
Explore at:
Dataset updated
Mar 31, 2017
Authors
Nathan, Senthilvel K. S. S.; Saldivar, Diana A. Ramirez; Vaughan, Ian P.; Goossens, Benoit; Stark, Danica J.
Description
Summary and methods used to calculate the physical characteristics used to compare the home range estimators.
US Monthly Birth Data
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Monthly Birth Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-monthly-birth-data
Explore at:
zip(3159011 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
The Devastator
Area covered
United States
Description
US Monthly Birth Data

US Monthly Birth Data by State and County

By data.world's Admin [source]

About this dataset

The data was obtained from multiple sources. Data from 1985-2002 were downloaded from the National Bureau for Economic Research through the National Center for Health Statistics' National Vital Statistics System. Data from 2003-2015 were sourced using aggregators provided by CDC's WONDER tool, utilizing Year, Month, State, and County filters. It is worth noting that geolocation information for individual babies born after 2005 is not released due to privacy concerns; therefore, all data has been aggregated by month.

The spatial applicability of this dataset is limited to the United States at the county level. It covers a temporal range spanning January 1, 1985 - December 31, 2015. Each row in the dataset represents aggregated birth counts within a specific county for a particular month and year.

Additional notes highlight that this dataset expands on data presented in an essay called The Timing of Baby Making published by The Pudding website in May 2017. While only data ranging from1995-2015 were displayed in the essay itself, this dataset includes an extra ten years of birth data. Furthermore, any non-US residents have been excluded from this dataset.

The provided metadata gives a detailed breakdown of the columns in the dataset, including their descriptions and data types. The included variables allow researchers to analyze births at both individual county and state levels over time. Finally, the dataset is available under the MIT License for public use

How to use the dataset

Here is a guide on how to effectively use this dataset:

Step 1: Understanding the Columns

The dataset consists of several columns that provide specific information about each birth record. Let's understand what each column represents:

State: The state (including District of Columbia) where the mother lives.

County: The county where the mother lives, coded using the FIPS County Code.

Month: The month in which the birth took place (1 = January, 2 = February, etc.).

Year: The four-digit year of the birth.

countyBirths: The calculated sum of births that occurred to mothers living in a county for a given month. If the sum was less than 9, it is listed as NA as per NCHS reporting guidelines.

stateBirths: The calculated sum of births that occurred to mothers living in a state for a given month. It includes all birth counts, even those from counties with fewer than 9 births.

Step 2: Exploring Birth Trends by State and County

You can analyze birth trends by focusing on specific states or counties within specific time frames. Here's how you can do it:

Filter by State or County:

Select rows based on your chosen state using the State column. Each number corresponds to a specific state (e.g., 01 = Alabama).

Further narrow down your analysis by selecting specific counties using their respective FIPS codes mentioned in the County column.

Analyze Monthly Variation:

Calculate monthly total births within your desired location(s) by grouping data based on the Month column.

Compare the number of births between different months to identify any seasonal trends or patterns.

Visualize Birth Trends:

Create line charts or bar plots to visualize how the number of births changes over time.

Plot a line or bar for each month across multiple years to identify any significant changes in birth rates.

Step 3: Comparison and Calculation

You can utilize this dataset to compare birth rates between states, counties, and regions. Here are a few techniques you can try:

State vs. County Comparison:

Calculate the total births within each state by aggregating

Research Ideas

Analyzing birth trends: This dataset can be used to analyze and understand the trends in birth rates across different states and counties over the period of 1985 to 2015. Researchers can study factors that may influence these trends, such as socioeconomic factors, healthcare access, or cultural changes.

Identifying seasonal variations: The dataset includes information on the month of birth for each entry. This data can be utilized to identify any seasonal variations in births across different locations in the US. Understanding these variations can help in planning resources and healthcare services accordingly.

Studying geographical patterns: By analyzing the county-level data, researchers can explore geographical patterns of childbirth throughout the United States. They can identify regions with high or low birth rates and...
d
Data from: Half interpercentile range (half of the difference between the...
catalog.data.gov
data.usgs.gov
+5more
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Half interpercentile range (half of the difference between the 16th and 84th percentiles) of wave-current bottom shear stress in the Middle Atlantic Bight for May, 2010 - May, 2011 (MAB_hIPR.SHP) [Dataset]. https://catalog.data.gov/dataset/half-interpercentile-range-half-of-the-difference-between-the-16th-and-84th-percentiles-of
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The U.S. Geological Survey has been characterizing the regional variation in shear stress on the sea floor and sediment mobility through statistical descriptors. The purpose of this project is to identify patterns in stress in order to inform habitat delineation or decisions for anthropogenic use of the continental shelf. The statistical characterization spans the continental shelf from the coast to approximately 120 m water depth, at approximately 5 km resolution. Time-series of wave and circulation are created using numerical models, and near-bottom output of steady and oscillatory velocities and an estimate of bottom roughness are used to calculate a time-series of bottom shear stress at 1-hour intervals. Statistical descriptions such as the median and 95th percentile, which are the output included with this database, are then calculated to create a two-dimensional picture of the regional patterns in shear stress. In addition, time-series of stress are compared to critical stress values at select points calculated from observed surface sediment texture data to determine estimates of sea floor mobility.
u
Data from: Predicting spatial-temporal patterns of diet quality and large...
agdatacommons.nal.usda.gov
docx
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Kearney; Lauren M. Porensky; David J. Augustine; Justin D. Derner; Feng Gao (2025). Data from: Predicting spatial-temporal patterns of diet quality and large herbivore performance using satellite time series [Dataset]. http://doi.org/10.15482/USDA.ADC/1522609
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1522609
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Sean Kearney; Lauren M. Porensky; David J. Augustine; Justin D. Derner; Feng Gao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Analysis-ready tabular data from "Predicting spatial-temporal patterns of diet quality and large herbivore performance using satellite time series" in Ecological Applications, Kearney et al., 2021. Data is tabular data only, summarized to the pasture scale. Weight gain data for individual cattle and the STARFM-derived Landsat-MODIS fusion imagery can be made available upon request. Resources in this dataset:Resource Title: Metadata - CSV column names, units and descriptions. File Name: Kearney_et_al_ECOLAPPL_Patterns of herbivore - metada.docxResource Description: Column names, units and descriptions for all CSV files in this datasetResource Title: Fecal quality data. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_FQ_cln.csvResource Description: Field-sampled fecal quality (CP = crude protein; DOM = digestible organic matter) data and phenology-related APAR metrics derived from 30 m daily Landsat-MODIS fusion satellite imagery. All data are paddock-scale averages and the paddock is the spatial scale of replication and week is the temporal scale of replication. Fecal samples were collected by USDA-ARS staff from 3-5 animals per paddock (10% - 25% of animals in each herd) weekly during each grazing season from 2014 to 2019 across 10 different paddocks at the Central Plains Experimental Range (CPER) near Nunn, CO. Samples were analyzed at the Grazingland Animal Nutrition Lab (GANlab, https://cnrit.tamu.edu/index.php/ganlab/) using near infrared spectroscopy (see Lyons & Stuth, 1992; Lyons, Stuth, & Angerer, 1995). Not every herd was sampled every week or every year, resulting in a total of 199 samples. Samples represent all available data at the CPER during the study period and were collected for different research and adaptive management objectives, but following the basic protocol described above. APAR metrics were derived from the paddock-scale APAR daily time series (all paddock pixels averaged daily to create a single paddock-scale time series). All APAR metrics are calculated for the week that corresponds to the week that fecal quality samples were collected in the field. See Section 2.2.4 of the corresponding manuscript for a complete description of the APAR metrics. Resource Title: Monthly ADG. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_ADG_monthly_cln.csvResource Description: Monthly average daily gain (ADG) of cattle weights at the paddock scale and the three satellite-derived metrics used to build regression model to predict AD: crude protein (CP), digestible organic matter (DOM) and aboveground net herbaceous production (ANHP). Data table also includes stocking rate (animal units per hectare) used as an interaction term in the ADG regression model and all associated data to derive each of these variables (e.g., sampling start and end dates, 30 m daily Landsat-MODIS fusion satellite imagery-derived APAR metrics, cattle weights, etc.). We calculated paddock-scale average daily gain (ADG, kg hd-1 day-1) from 2000-2019 for yearlings weighed approximately every 28-days during the grazing season across 6 different paddocks with stocking densities of 0.08 – 0.27 animal units (AU) ha-1, where one AU is equivalent to a 454 kg animal. It is worth noting that AU’s change as a function of both the number of cattle within a paddock and the size of individual animals, the latter of which changes within a single grazing season. This becomes important to consider when using sub-seasonal weight data for fast-growing yearlings. For paddock-scale ADG, we first calculated ADG for each individual yearling as the difference between the weights obtained at the end and beginning of each period, divided by the number of days in each period, and then averaged for all individuals in the paddock. We excluded data from 2013 due to data collection inconsistencies. We note that most of the monthly weight data (97%) is from 3 paddocks where cattle were weighed every year, whereas in the other 3 paddocks, monthly weights were only measured during 2017-2019. Apart from the 2013 data, which were not comparable to data from other years, the data represents all available weight gain data for CPER to maximize spatial-temporal coverage and avoid potential bias from subjective decisions to subset the data. Data may have been collected for different projects at different times, but was collected in a consistent way. This resulted in 269 paddock-scale estimates of monthly ADG, with robust temporal, but limited spatial, coverage. CP and DOM were estimated from a random forest model trained from the five APAR metrics: rAPAR, dAPAR, tPeak, iAPAR and iAPAR-dry (see manuscript Section 2.3 for description). APAR metrics were derived from the paddock-scale APAR daily time series (all paddock pixels averaged daily to create a single paddock-scale time series). All APAR metrics are calculated as the average of the approximately 28-day period that corresponds to the ADG calculation. See Section 2.2.4 of the manuscript for a complete description of the APAR metrics. ANHP was estimated from a linear regression model developed by Gaffney et al. (2018) to calculate net aboveground herbaceous productivity (ANHP; kg ha-1) from iAPAR. We averaged the coefficients of 4 spatial models (2013-2016) developed by Gaffney et al. (2018), resulting in the following equation: ANHP = -26.47 + 2.07(iAPAR) We first calculated ANHP for each day of the grazing season at the paddock scale, and then took the average ANHP for the 28-day period. REFERENCES: Gaffney, R., Porensky, L. M., Gao, F., Irisarri, J. G., Durante, M., Derner, J. D., & Augustine, D. J. (2018). Using APAR to predict aboveground plant productivity in semi-aid rangelands: Spatial and temporal relationships differ. Remote Sensing, 10(9). doi: 10.3390/rs10091474 Resource Title: Season-long ADG. File Name: Kearney_etal2021_Patterns_of_herbivore_Data_ADG_seasonal_cln.csvResource Description: Season-long observed and model-predicted average daily gain (ADG) of cattle weights at the paddock scale. Also includes two variables used to analyze patterns in model residuals: percent sand content and season-long aboveground net herbaceous production (ANHP). We calculated observed paddock-scale ADG for the entire grazing season from 2010-2019 (excluding 2013 due to data collection inconsistencies) by averaging seasonal ADG of each yearling, determined as the difference between the end and starting weights divided by the number of days in the grazing season. This dataset was available for 40 paddocks spanning a range of soil types, plant communities, and topographic positions. Data may have been collected for different projects at different times, but was collected in a consistent way. We note that there was spatial overlap among a small number paddock boundaries across different years since some fence lines were moved in 2012 and 2014. Model-predicted paddock-scale ADG was derived using the monthly ADG regression model described in Sections 2.3.3 and 2.3.4. of the associated manuscript. In short, we predicted season-long cattle weight gains by first predicting daily weight gain for each day of the grazing season from the monthly regression model using a 28-day moving average of model inputs (CP, DOM and ANHP ). We calculated the final ADG for the entire grazing season as the average predicted ADG, starting 28-days into the growing season. Percent sand content was obtained as the paddock-scale average of POLARIS sand content in the upper 0-30 cm. ANHP was calculated on the last day of the grazing season fusing a linear regression model developed by Gaffney et al. (2018) to calculate net aboveground herbaceous productivity (ANHP; kg ha-1) from satellite-derived integrated absorbed photosynthetically active radiation (iAPAR) (see Section 3.1.2 of the associated manuscript). We averaged the coefficients of 4 spatial models (2013-2016) developed by Gaffney et al. (2018), resulting in the following equation: ANHP = -26.47 + 2.07(iAPAR) REFERENCES: Gaffney, R., Porensky, L. M., Gao, F., Irisarri, J. G., Durante, M., Derner, J. D., & Augustine, D. J. (2018). Using APAR to predict aboveground plant productivity in semi-aid rangelands: Spatial and temporal relationships differ. Remote Sensing, 10(9). doi: 10.3390/rs10091474

Facebook

Twitter

Click to copy link

Link copied

Cite

DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id

Mathematics Dataset

Explore at:

Dataset updated

Apr 3, 2019

Dataset provided by

DeepMindhttp://deepmind.com/

Description

This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

 Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
 Answer: 4
 
 Question: Calculate -841880142.544 + 411127.
 Answer: -841469015.544
 
 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
 Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

algebra (linear equations, polynomial roots, sequences)
arithmetic (pairwise operations and mixed expressions, surds)
calculus (differentiation)
comparison (closest numbers, pairwise comparisons, sorting)
measurement (conversion, working with time)
numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
polynomials (addition, simplification, composition, evaluating, expansion)
probability (sampling without replacement)

Clear search

Close search

Google apps

Main menu

Mathematics Dataset

GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 -...

housing

Salaries case study

Data from: Current and projected research data storage needs of Agricultural...

NIST Stopping-Power & Range Tables for Electrons, Protons, and Helium Ions -...

Data from: Analysis of the Scalar and Vector Random Coupling Models For a...

Data from: Haploids adapt faster than diploids across a range of...

Python numerical computation code for the article of "Numerical study of...

Dataset for the paper "Observation of Acceleration and Deceleration Periods...

Monthly time series of spatially enhanced relative humidity for Europe at...

Google Analytics data of an E-commerce Company

📊 Dataset Title: Daily Active Users Dataset

📝 Description

📂 Dataset Structure

🧐 Key Use Cases

📈 Potential Analysis

🚀 Getting Started

Data from: U.S. Geological Survey calculated half interpercentile range...

Data from: Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 -...

Data from: Contrasting effects of host or local specialization: widespread...

Annual Average Temperature Change - Projections (12km)

Summary and methods used to calculate the physical characteristics used to...

US Monthly Birth Data

US Monthly Birth Data

US Monthly Birth Data by State and County

About this dataset

How to use the dataset

Step 1: Understanding the Columns

Step 2: Exploring Birth Trends by State and County

Step 3: Comparison and Calculation

Research Ideas

Data from: Half interpercentile range (half of the difference between the...

Data from: Predicting spatial-temporal patterns of diet quality and large...

Mathematics Dataset