100+ datasets found

Mathematics Dataset
github.com
opendatalab.com
+1more
Updated Apr 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
Explore at:
Dataset updated
Apr 3, 2019
Dataset provided by
DeepMindhttp://deepmind.com/
Description
This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r. Answer: 4 Question: Calculate -841880142.544 + 411127. Answer: -841469015.544 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)). Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

algebra (linear equations, polynomial roots, sequences)

arithmetic (pairwise operations and mixed expressions, surds)

calculus (differentiation)

comparison (closest numbers, pairwise comparisons, sorting)

measurement (conversion, working with time)

numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)

polynomials (addition, simplification, composition, evaluating, expansion)

probability (sampling without replacement)
original : CIFAR 100
kaggle.com
zip
Updated Dec 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100
Explore at:
zip(168517945 bytes)Available download formats
Dataset updated
Dec 28, 2024
Authors
Shashwat Pandey
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...
Z
ANN development + final testing datasets
data.niaid.nih.gov
resodate.org
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
Explore at:
Dataset updated
Jan 24, 2020
Authors
Authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
File name definitions:

'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

'...v_175_250...' - dataset for velocity range [175, 250] m/s

'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

input values in 'IN' sheet

target values in 'TARGET' sheet

Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

Check reference below (to be added when the paper is published)

https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
housing
kaggle.com
zip
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HappyRautela (2023). housing [Dataset]. https://www.kaggle.com/datasets/happyrautela/housing
Explore at:
zip(809785 bytes)Available download formats
Dataset updated
Sep 22, 2023
Authors
HappyRautela
Description
The exercise after this contains questions that are based on the housing dataset.

How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173

How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161

How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92

What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000

For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.

What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features

If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.

If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above

If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
Prime Number Source Code with Dataset
figshare.com
zip
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27215508.v1
Dataset updated
Oct 12, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ayman Mostafa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.
Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
R
Data from: Rmai Dataset
universe.roboflow.com
zip
Updated Jun 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Ranjan (2022). Rmai Dataset [Dataset]. https://universe.roboflow.com/ayush-ranjan-y2h6m/rmai/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Jun 27, 2022
Dataset authored and provided by
Ayush Ranjan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Letters Bounding Boxes
Description
Here are a few use cases for this project:

Educational Tools: The RMAI model can be used to develop applications or tools to teach children or adults about letters and numbers. By scanning real-life objects or text, it can identify the mentioned classes and further enhance the learning experience.

Identification of License Plate Numbers: The model can be employed in surveillance software to identify vehicle license plates. Despite the model not being explicitly trained for this purpose, the ability to recognize the mentioned numeral and letter classes may be sufficient for basic applications.

Robot Navigation: The reference image suggests potential for robot navigation use. Robots could use this model to read numbers and letters in their environment, which could be used in synchronizing tasks or following specified routes in a warehouse or factory setting.

Accessibility Tools: The model can be used to develop applications for visually impaired people to read and comprehend written material. This can range from reading books, recognizing signs, or identifying different objects that have numbers or letters on them.

Data Sorting: In an office or warehouse setting, this model could be used to sort packages, files or items based on numbers and letters. This will help in increasing efficiency and reducing potential errors in the process.

Credit Card Eligibility Data: Determining Factors

kaggle.com

zip

Updated May 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Rohit Sharma (2024). Credit Card Eligibility Data: Determining Factors [Dataset]. https://www.kaggle.com/datasets/rohit265/credit-card-eligibility-data-determining-factors

Explore at:

zip(303227 bytes)Available download formats

Dataset updated

May 18, 2024

Authors

Rohit Sharma

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description of the Credit Card Eligibility Data: Determining Factors

The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.

Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.

Variable	Description
ID	An identifier for each individual (customer).
Gender	The gender of the individual.
Own_car	A binary feature indicating whether the individual owns a car.
Own_property	A binary feature indicating whether the individual owns a property.
Work_phone	A binary feature indicating whether the individual has a work phone.
Phone	A binary feature indicating whether the individual has a phone.
Email	A binary feature indicating whether the individual has provided an email address.
Unemployed	A binary feature indicating whether the individual is unemployed.
Num_children	The number of children the individual has.
Num_family	The total number of family members.
Account_length	The length of the individual's account with a bank or financial institution.
Total_income	The total income of the individual.
Age	The age of the individual.
Years_employed	The number of years the individual has been employed.
Income_type	The type of income (e.g., employed, self-employed, etc.).
Education_type	The education level of the individual.
Family_status	The family status of the individual.
Housing_type	The type of housing the individual lives in.
Occupation_type	The type of occupation the individual is engaged in.
Target	The target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0).

Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.

This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.

n
Data from: Contrasting effects of host or local specialization: widespread...
data-staging.niaid.nih.gov
ourarchive.otago.ac.nz
+3more
zip
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin (2024). Contrasting effects of host or local specialization: widespread haemosporidians are host generalist whereas local specialists are locally abundant [Dataset]. http://doi.org/10.5061/dryad.j3tx95xfb
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j3tx95xfb
Dataset updated
Mar 13, 2024
Dataset provided by
University of Otago
Universidade Estadual de Campinas (UNICAMP)
Authors
Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
d
Data from: Chemical alteration index values and rare earth element data and...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Chemical alteration index values and rare earth element data and expected ranges for regolith overlying the Stewartsville pluton, Virginia [Dataset]. https://catalog.data.gov/dataset/chemical-alteration-index-values-and-rare-earth-element-data-and-expected-ranges-for-regol
Explore at:
Dataset updated
Oct 30, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Virginia, Stewartsville
Description
This dataset contains whole major element geochemical data used to calculate values of the chemical alteration index (CIA), data for Nd, Sm, Y, and total REE and expected ranges for total REEY for samples of regolith overlying the Stewartsville pluton, Virginia. The southeastern United States was first identified as prospective for regolith-hosted REE deposits based on the recognition that the region has been subjected to a long history of intense differential chemical weathering and saprolitization, comparable to that which formed the REE clay deposits of South China and Southeast Asia since the break-up of Pangea (Foley and Ayuso, 2013). Foley et al. (2014) established that due to their inherent high concentrations of REE, anorogenic (A-type) and highly fractionated igneous (I-type) granitic rocks of southeastern United States were highly prospective source rocks for deposits of this type. More recently, additional studies investigated accumulation processes resulting in high concentrations of REE in granite-derived regolith deposits related to the Stewartsville pluton and other plutons in Virginia. The Stewartsville pluton was emplaced along the flank of the Blue Ridge province during regional crustal extension related to the opening of the Iapetus Ocean and breakup of the supercontinent Rodinia. The studied rock samples consist of medium- to coarse-grained biotite granite and are mineralogically complex. They contain phenocrysts of quartz, sericitized and albitized k-feldspar, sodic plagioclase, and mafic clots and stringers that are composed primarily of biotite and stilpnomelane and, less typically, include magnetite and remnant cores of green and green-brown hornblende. Feldspar contains inclusions of synchysite and fergusonite; other accessory minerals include abundant and diagnostic allanite and fluorite, as well as apatite, epidote, garnet, Nb-rutile, fergusonite, monazite, titanite, xenotime, gadolinite, and zircon (Foley and Ayuso, 2015 and references therein). Granite outcrop exposures in the Piedmont and Blue Ridge areas of Virginia tend to be intensely weathered, with overlying regoliths ranging from thin and discontinuous to meters thick and laterally extensive, and often with overlying B-horizon type soils. Saprolite can extend down to depths of tens of meters below the B-horizon. In the case of the Stewartsville Pluton, regolith is well developed in multiple exposures. The sampled section described in this data release is >20 meters high by >60 meters long. The profile includes nearly fresh rock, partially to highly weathered saprolite, indurated gravels and sands, and poorly delineated layers of subsoil and topsoil. Granite at the base of the profile is iron stained (mostly goethite) and weathered on exposed surfaces and along cracks. Partially weathered sections of the outcrop display a range of rock textures throughout, rather than systematic changes from base to surface. For example, in the lower parts, cobble and boulder-sized relics of spheroidally weathered granite knobs retain distinctive primary textures but are surrounded by nearly disaggregated granite that crumbles to sand and gravel-sized fragments when sampled. Subsoils, mainly B-horizon, comprise the uppermost meter of the section and contain a higher proportion of clay minerals (i.e. kaolinite-nontronite-iron-oxide mixtures) than the underlying saprolite.
Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...
datarade.ai
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 25, 2024
Dataset provided by
Area covered
United States
Description
Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

API Features:

Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.

High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.

Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
d
Major Basin Lines
catalog.data.gov
data.ct.gov
+3more
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Energy & Environmental Protection (2025). Major Basin Lines [Dataset]. https://catalog.data.gov/dataset/major-basin-lines-f1f41
Explore at:
Dataset updated
Feb 12, 2025
Dataset provided by
Department of Energy & Environmental Protection
Description
See full Data Guide here.Major Drainage Basin Set: Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data. Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data.
g
EDF SA's social report - Training
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EDF SA's social report - Training [Dataset]. https://gimi9.com/dataset/eu_https-opendata-edf-fr-explore-dataset-bilan-social-d-edf-sa-formation-/
Explore at:
Description
Summary: Here you will find the data from the social report of edf sa, relating to training. The main indicators of this dataset are: * Percentage of the salary base subject to social security contributions under the general scheme for continuing training * Amount spent on continuing training * Number of employees who have completed at least one traineeship * Number of employees having completed at least one traineeship per M3E range * Number of hours of training * Number of hours of training per M3E range * Number of employees who have benefited from paid individual training leave * Number of employees who received individual unpaid training leave * Number of alternating contracts concluded during the year #### Information note: The data available here are extracted from the social balance sheet of edf sa. If you need additional information or data, do not hesitate to consult it by referring to the source paragraphs referenced for each row of data published here.
j
Data from: Dataset supporting the publication: "Direct and indirect effects...
jyx.jyu.fi
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Kalathil Parambil; Katja Räsänen; Blake Matthews (2024). Dataset supporting the publication: "Direct and indirect effects of chemical pollution: Fungicides alter growth, feeding, and pigmentation of the freshwater detritivore Asellus aquaticus" Aineisto julkaisuun: "Direct and indirect effects of chemical pollution: Fungicides alter growth, feeding, and pigmentation of the freshwater detritivore Asellus aquaticus" [Dataset]. http://doi.org/10.17011/jyx/dataset/97236
Explore at:
Unique identifier
https://doi.org/10.17011/jyx/dataset/97236
Dataset updated
Sep 25, 2024
Authors
Akshay Kalathil Parambil; Katja Räsänen; Blake Matthews
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General information: Title of the Paper: "Direct and Indirect Effects of Fungicides on Growth, Feeding, and Pigmentation of the Freshwater Detritivore Asellus aquaticus" Authors: Akshay Mohan, Blake Matthews, Katja Räsänen Date of Data Collection: July 2022 - October 2022 Funder: University of Jyväskylä Publication Date: 15.10.2024 DOI of the Associated Paper: 10.1016/j.ecoenv.2024.117017 Overview of the Dataset: This dataset comprises two primary files used in the analysis of fungicide effects on Asellus aquaticus: Rangefinding.csv: The file contains results from a range-finding experiment where three fungicides were tested at different concentrations. Tebuconazole-Expt.csv: The file ncludes results from the direct and indirect exposure experiment, where Asellus aquaticus was exposed to Tebuconazole through both water and diet. Data Files and Structure: The file “Rangefinding.csv” contains the results of the range-finding study. It has the following columns: Number: Unique identifier for each entry. Image_name: Filename of the image taken for each isopod. Plate_number: Identifier for the experimental plate used. Fungicide: The type of fungicide used (e.g., Tebuconazole). Concentration: Fungicide concentration (µg/L). Initial_size, Final_size: Measured body length in cm (using FIJI v1.54d) of individuals before and after exposure. Growth_rate: Calculated growth over the study. Initial_leaf_area, Final_leaf_area: Measured area of leaf discs in mm2 (using FIJI v1.54d) before and after exposure. Feeding_rate: Rate of leaf consumption during the study. Survival: Binary (1 = alive, 0 = dead). The file “Tebuconazole final data mod.csv” contains results from the direct and indirect exposure study. It has the following columns: Number: Unique identifier for each entry. Image_name: Filename of the image taken for each isopod. Plate_number: Identifier for the experimental plate used. Treatment: Combination of diet and exposure. Diet: Specifies the diet treatment conditions. Exposure: Specifies the exposure conditions. Initial_area, Final_area: Measured body area in mm2 (using Phenopype v3.3.4) of individuals before and after exposure. Growth_rate: Calculated growth over the study period. Initial_leaf_area, Final_leaf_area: Measured area of leaf discs in mm2 (using FIJI v1.54d) before and after exposure. Feeding_rate: Rate of leaf consumption during the study. Pigmentation_Initial, Pigmentation_final: Measured pigmentation values (using Phenopype v3.3.4) before and after exposure. Pigmentation_rate: Change in pigmentation measured over the study period. Survival, Moulting: Binary indicators of survival and molting. Methodology: Data Collection: In both experiments, individual Asellus aquaticus were exposed to varying concentrations of fungicides, and the response variables were measured weekly using digital photographs and analyzed using image analysis softwares. Data Processing: Growth rates were calculated based on the differences between initial and final sizes, and feeding rates were based on leaf area consumed. Pigmentation was measured using grayscale values. Access and Licensing: The dataset is licensed under [CC BY 4.0], allowing reuse with proper attribution. DOI for the dataset will be available after publication on JYX. Contact Information: For further details or inquiries, contact: Akshay Mohan - akmohank@jyu.fi Blake Matthews - blake.matthews@eawag.ch Katja Räsänen - katja.j.rasanen@jyu.fi
d
Data from: ARS Water Database
catalog.data.gov
data.cnra.ca.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). ARS Water Database [Dataset]. https://catalog.data.gov/dataset/ars-water-database-82912
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
The ARS Water Data Base is a collection of precipitation and streamflow data from small agricultural watersheds in the United States. This national archive of variable time-series readings for precipitation and runoff contains sufficient detail to reconstruct storm hydrographs and hyetographs. There are currently about 14,000 station years of data stored in the data base. Watersheds used as study areas range from 0.2 hectare (0.5 acres) to 12,400 square kilometers (4,786 square miles). Raingage networks range from one station per watershed to over 200 stations. The period of record for individual watersheds vary from 1 to 50 years. Some watersheds have been in continuous operation since the mid 1930's. Resources in this dataset:Resource Title: FORMAT INFORMATION FOR VARIOUS RECORD TYPES. File Name: format.txtResource Description: Format information identifying fields and their length will be included in this file for all files except those ending with the extension .txt TYPES OF FILES As indicated in the previous section data has been stored by location number in the form, LXX where XX is the location number. In each subdirectory, there will be various files using the following naming conventions: Runoff data: WSXXX.zip where XXX is the watershed number assigned by the WDC. This number may or may not correspond to a naming convention used in common literature. Rainfall data: RGXXXXXX.zip where XXXXXX is the rain gage station identification. Maximum-minimum daily air temperature: MMTXXXXX.zip where XXXXX is the watershed number assigned by the WDC. Ancillary text files: NOTXXXXX.txt where XXXXX is the watershed number assigned by the WDC. These files will contain textual information including latitude-longitude, name commonly used in literature, acreage, most commonly-associated rain gage(s) (if known by the WDC), a list of all rain gages on or near the watershed. Land use, topography, and soils as known by the WDC. Topographic maps of the watersheds: MAPXXXXX.zip where XXXXX is the location/watershed number assigned by the WDC. Map files are binary TIF files. NOT ALL FILE TYPES MAY BE AVAILABLE FOR SPECIFIC WATERSHEDS. Data files are still being compiled and translated into a form viable for this archive. Please bear with us while we grow.Resource Title: Data Inventory - watersheds. File Name: inventor.txtResource Description: Watersheds at which records of runoff were being collected by the Agricultural Research Service. Variables: Study Location & Number of Rain Gages1; Name; Lat.; Long; Number; Pub. Code; Record Began; Land Use2; Area (Acres); Types of Data3Resource Title: Information about the ARS Water Database. File Name: README.txtResource Title: INDEX TO INFORMATION ON EXPERIMENTAL AGRICULTURAL WATERSHEDS. File Name: INDEX.TXTResource Description: This report includes identification information on all watersheds operated by the ARS. Only some of these are included in the ARS Water Data Base. They are so indicated in the column titled ARS Water Data Base. Other watersheds will not have data available here or through the Water Data Center. This index is particularly important since it relates watershed names with the indexing system used by the Water Data Center. Each location has been assigned a number. The data for that location will be stored in a sub-directory coded as LXX where XX is the location number. The index also indicates the watershed number used by the WDC. Data for a particular watershed will be stored in a compressed file named WSXXXXX.zip where XXXXX is the watershed number assigned by the WDC. Although not included in the index, rain gage information will be stored in compressed files named RGXXXXXX.zip where XXXXXX is a 6-character identification of the rain gage station. The Index also provides information such as latitude-longitude for each of the watersheds, acreage, the period-of-record for each acreage. Multiple entries for a particular watershed will either indicate that the acreage designated for the watershed changed or there was a break in operations of the watershed. Resource Title: ARS Water Database files. File Name: ars_water.zipResource Description: USING THIS SYSTEM Before downloading huge amounts of data from the ARS Water Data Base, you should first review the text files included in this directory. They include: INDEX OF ARS EXPERIMENTAL WATERSHEDS: index.txt This report includes identification information on all watersheds operated by the ARS. Only some of these are included in the ARS Water Data Base. They are so indicated in the column titled ARS Water Data Base. Other watersheds will not have data available here or through the Water Data Center. This index is particularly important since it relates watershed names with the indexing system used by the Water Data Center. Each location has been assigned a number. The data for that location will be stored in a sub-directory coded as LXX where XX is the location number. The index also indicates the watershed number used by the WDC. Data for a particular watershed will be stored in a compressed file named WSXXXXX.zip where XXXXX is the watershed number assigned by the WDC. Although not included in the index, rain gage information will be stored in compressed files named RGXXXXXX.zip where XXXXXX is a 6-character identification of the rain gage station. The Index also provides information such as latitude-longitude for each of the watersheds, acreage, the period-of-record for each acreage. Multiple entries for a particular watershed will either indicate that the acreage designated for the watershed changed or there was a break in operations of the watershed. STATION TABLE FOR THE ARS WATER DATA BASE: station.txt This report indicates the period of record for each recording station represented in the ARS Water Data Base. The data for a particular station will be stored in a single compressed file. FORMAT INFORMATION FOR VARIOUS RECORD TYPES: format.txt Format information identifying fields and their length will be included in this file for all files except those ending with the extension .txt TYPES OF FILES As indicated in the previous section data has been stored by location number in the form, LXX where XX is the location number. In each subdirectory, there will be various files using the following naming conventions: Runoff data: WSXXX.zip where XXX is the watershed number assigned by the WDC. This number may or may not correspond to a naming convention used in common literature. Rainfall data: RGXXXXXX.zip where XXXXXX is the rain gage station identification. Maximum-minimum daily air temperature: MMTXXXXX.zip where XXXXX is the watershed number assigned by the WDC. Ancillary text files: NOTXXXXX.txt where XXXXX is the watershed number assigned by the WDC. These files will contain textual information including latitude-longitude, name commonly used in literature, acreage, most commonly-associated rain gage(s) (if known by the WDC), a list of all rain gages on or near the watershed. Land use, topography, and soils as known by the WDC. Topographic maps of the watersheds: MAPXXXXX.zip where XXXXX is the location/watershed number assigned by the WDC. Map files are binary TIF files. NOT ALL FILE TYPES MAY BE AVAILABLE FOR SPECIFIC WATERSHEDS. Data files are still being compiled and translated into a form viable for this archive. Please bear with us while we grow.
OpenCitations Meta RDF dataset of page numbers metadata and its provenance...
nde-dev.biothings.io
data.niaid.nih.gov
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2024). OpenCitations Meta RDF dataset of page numbers metadata and its provenance information [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10936231
Explore at:
Dataset updated
Apr 6, 2024
Dataset authored and provided by
OpenCitationshttps://opencitations.net/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to page numbers of bibliographic resources, known as manifestations (http://purl.org/spar/fabio/Manifestation). It contains all the bibliographic metadata and its provenance information, structured specifically around manifestations (page numbers), in JSON-LD format.

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/se.zip

Additional information about OpenCitations Meta at the official webpage.
Number of Freshwater Turtle and Crocodilian Species by Freshwater Ecoregion...
opendata.rcmrd.org
Updated Jan 10, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Nature Conservancy (2012). Number of Freshwater Turtle and Crocodilian Species by Freshwater Ecoregion (Data Basin Dataset) [Dataset]. https://opendata.rcmrd.org/content/6680aa0a0fe64386abde7a12e5b31862
Explore at:
Dataset updated
Jan 10, 2012
Dataset authored and provided by
The Nature Conservancyhttp://www.nature.org/
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Area covered
Description
Number of turtle and crocodilian species, by freshwater ecoregion.

We generated the map of freshwater turtle and crocodilian species richness—the number of species present in each ecoregion—from species distribution maps, primarily drawing on the sources listed below.

Distribution maps for 260 freshwater turtle species were provided by Buhlmann et al. (2007). The original distribution maps represented coarse ranges of where species were thought to be present in the wild; however, they were not exact ranges. Buhlmann et al. compiled data from museum and literature records. They correlated verified locality points with GIS-defined hydrologic unit codes (HUCs) and subsequently created “projected” distribution maps for each species by selecting additional HUCs that were representative of similar habitats, elevations, and physiographic regions as the HUCs with the verified point localities. The amount of information available varied by species, as some species and regions are better studied than others. In addition, many species names, especially in the tropics, actually represent complexes of several turtle species that have not yet been disaggregated.

In developing our map, when a range overlapped several ecoregions, we counted species as present in all those ecoregions that had part of the range. Some ecoregions with a long and narrow shape may have an overestimation of species in our map given the way the range polygons were drawn. This is particularly true in the Amazonas High Andes ecoregion (312), where the mountain range has been used as a range boundary for hundreds of species.

For crocodilians, species range maps are from the IUCN-SSC Crocodile Specialist Group and Britton (2007). Species range maps were assessed visually, and species presence was assigned to ecoregions. When a range overlapped several ecoregions, we counted the species as present in all ecoregions of range overlap.

These data were derived by The Nature Conservancy and World Wildlife Fund, and were displayed in a map published in The Atlas of Global Conservation (Hoekstra et al., University of California Press, 2010). More information at http://nature.org/atlas.

The following were our primary data sources:

Britton, A. 2007. Information on crocodilian species distributions. Available at www.flmnh.ufl.edu/cnhc/csl.html.

Buhlmann, K. A., T. B. Akre, J. B. Iverson, D. Karapatakis, R. A. Mittermeier, A. Georges, G. J. Rhodin, P. P. van Dijk, and J. W. Gibbons. 2007. A global analysis of tortoise and freshwater turtle distributions. Data from the preliminary results of the Global Reptile Assessment. International Union for Conservation of Nature–Species Survival Commission (IUCN-SSC), Conservation International/Center for Applied Biological Science (CI/CABS), and Savannah River Ecology Laboratory, University of Georgia, Aiken, South Carolina, USA.

International Union for Conservation of Nature (IUCN)–SSC Crocodile Specialist Group Web site. 2008. Available at http://iucncsg.org/ph1/modules/Home/.

Citation:Hoekstra, J. M., J. L. Molnar, M. Jennings, C. Revenga, M. D. Spalding, T. M. Boucher, J. C. Robertson, T. J. Heibel, with K. Ellison. 2010. The Atlas of Global Conservation: Changes, Challenges, and Opportunities to Make a Difference. Ed. J. L. Molnar. Berkeley: University of California Press. This dataset was used in a scientifically peer-reviewed publication.

This layer package was loaded using Data Basin.Click here to go to the detail page for this layer package in Data Basin, where you can find out more information, such as full metadata, or use it to create a live web map.
f
Individual horse values used to calculate medians and ranges for Figs 2–6.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated May 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Babasyan, Susanna; Wagner, Bettina; Larson, Elisabeth M. (2020). Individual horse values used to calculate medians and ranges for Figs 2–6. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000553284
Explore at:
Dataset updated
May 22, 2020
Authors
Babasyan, Susanna; Wagner, Bettina; Larson, Elisabeth M.
Description
Individual percentages, median fluorescent intensities and concentrations for each horse that were used to generate figure graphs are compiled in labeled data tables. (A) Percentage of IgE+ monocytes out of total cells in unsorted, MACS sorted and MACS+FACS sorted samples from 18 different horses in Fig 2D. (B) Percentage of CD23- cells out of total IgE+ monocytes in Fig 3D. (C) Clinical scores of allergic in in Fig 4A. (D) Percentage of IgE+ monocytes out of total monocytes in Fig 4C. (E) Percentage of CD16+ cells out of total IgE+ monocytes in Fig 4D. (F) Serum total IgE (ng/ml) measured by bead-based assay in Fig 5A. (G) IgE median fluorescent intensity (MFI) of IgE mAb 176 (Alexa Fluor 488) on IgE+ monocytes in Fig 5B. (H) Combined serum total IgE and IgE MFI on IgE+ monocytes in Fig 5C. (I) Percentage of monocytes out of total IgE+ cells in Fig 6A. (J) Secreted concentration of IL-10 (pg/ml), IL-4 (pg/ml), IFN? (MFI) and IL-17A (MFI) as measured by bead-based assay in Fig 6B. (K) Percentage of CD16+ cells out of total IgE- CD14+ monocytes. B-H,K show allergic (n = 7) and nonallergic (n = 7) horses, J shows allergic (n = 8) and nonallergic (n = 8) horses in October 2019. C-H,K show data points collected from April 2018-March 2019. (XLSX)

Facebook

Twitter

Click to copy link

Link copied

Cite

DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id

Mathematics Dataset

Explore at:

Dataset updated

Apr 3, 2019

Dataset provided by

DeepMindhttp://deepmind.com/

Description

This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

 Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
 Answer: 4
 
 Question: Calculate -841880142.544 + 411127.
 Answer: -841469015.544
 
 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
 Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

algebra (linear equations, polynomial roots, sequences)
arithmetic (pairwise operations and mixed expressions, surds)
calculus (differentiation)
comparison (closest numbers, pairwise comparisons, sorting)
measurement (conversion, working with time)
numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
polynomials (addition, simplification, composition, evaluating, expansion)
probability (sampling without replacement)

Clear search

Close search

Google apps

Main menu

Mathematics Dataset

original : CIFAR 100

ANN development + final testing datasets

housing

Prime Number Source Code with Dataset

Data from: Current and projected research data storage needs of Agricultural...

Meta data and supporting documentation

Data from: Rmai Dataset

Credit Card Eligibility Data: Determining Factors

Data from: Contrasting effects of host or local specialization: widespread...

Simulation Data Set

Data from: Chemical alteration index values and rare earth element data and...

Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

Major Basin Lines

EDF SA's social report - Training

Data from: Dataset supporting the publication: "Direct and indirect effects...

Data from: ARS Water Database

OpenCitations Meta RDF dataset of page numbers metadata and its provenance...

Number of Freshwater Turtle and Crocodilian Species by Freshwater Ecoregion...

Individual horse values used to calculate medians and ranges for Figs 2–6.

Mathematics Dataset