100+ datasets found
  1. H

    Large Dataset of Generalization Patterns in the Number Game

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Eric J. Bigelow; Steven T. Piantadosi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

  2. Prime Number Source Code with Dataset

    • figshare.com
    zip
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ayman Mostafa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.

  3. Data from: MNIST Handwritten Digits Dataset

    • kaggle.com
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghanshyam Saini (2025). MNIST Handwritten Digits Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/mnist-handwritten-digits-dataset/versions/1
    Explore at:
    zip(29605861 bytes)Available download formats
    Dataset updated
    May 15, 2025
    Authors
    Ghanshyam Saini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MNIST Handwritten Digits Dataset (Organized by Folder)

    This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).

    Dataset Structure:

    The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:

    • train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).

    • test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.

    Content of the Data:

    Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).

    How to Use This Dataset:

    1. Download the main MNIST folder (or the archive containing it) and extract its contents.
    2. Navigate into the mnist_png folder.
    3. The train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.
    4. The train folder provides the images you can use to train your machine learning models.
    5. The test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.

    Citation:

    The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.

    Data Contribution:

    Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.

    If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!

  4. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  5. S

    LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target...

    • scidb.cn
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    陈小龙; 刘佳; 汪兴海; 关键 (2025). LSS-DAUR-1.0: Digital Array Ubiquitous Radar Low, Slow, and Small Target Detection Dataset [Dataset]. http://doi.org/10.57760/sciencedb.radars.00076
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    Science Data Bank
    Authors
    陈小龙; 刘佳; 汪兴海; 关键
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The Low, Slow, and Small Target Detection Dataset for Digital Array Surveillance Radar (LSS-DAUR-1.0) includes a total of 154 items of Range-Doppler (RD) complex data and Track (TR) point data collected from 6 types of targets (passenger ships, speedboats, helicopters, rotary-wing UAVs, birds, fixed-wing UAVs). It can support research on detection, classification and recognition of typical maritime targets by digital array radar. 1. Data Collection Process The data collection process mainly includes: Set radar parameters → Detect targets → Collect echo signal data → Record target information → Determine the range bin where the target is located → Extract target Doppler data → Extract target track data. 2. Target Situation The collected typical sea-air targets include 6 categories: passenger ships, speedboats, helicopters, rotary-wing UAVs, birds and fixed-wing UAVs. 3. Range-Doppler (RD) Complex Data By calculating the target range, the echo data of the range bin where the target is located is intercepted. Based on the collected measured data, the Low, Slow, and Small Target RD Dataset for Digital Array Surveillance Radar is constructed, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes the target's Doppler, GPS time, frame count, etc. The naming method of target RD data is: Start Collection Time_DAUR_RD_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_RD_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "RD" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number. 4. Track (TR) Data Extract the track data within the time period of the echo data, and construct the Low, Slow, and Small Target TR Dataset for Digital Array Surveillance Radar, which includes 10 groups of passenger ship (passenger ship) data, 11 groups of speedboat (speedboat) data, 10 groups of helicopter (helicopter) data, 18 groups of rotary-wing UAV (rotary drone) data, 17 groups of bird (bird) data, and 11 groups of fixed-wing UAV (fixed-wing drone) data, totaling 77 groups. Each group of data includes target range, target azimuth, elevation angle, target speed, GPS time, signal-to-noise ratio (SNR), etc. The TR data and RD data have the same time and batch number, and they are data of different dimensions for the same target in the same time period. The naming method of target TR data is: Start Collection Time_DAUR_TR_Target Type_Serial Number_Target Batch Number.Mat. For example, the file name "20231207093748_DAUR_TR_Passenger Ship_01_2619.mat", where "20231207" represents the date of data collection, "093748" represents the start time of collection which is 09:37:48, "DAUR" represents Digital Array Surveillance Radar, "TR" represents Range-Doppler spectrum complex data, "Passenger Ship_01" represents the target type is passenger ship with serial number 01, and "2619" represents the target track batch number.

  6. e

    Data from: Land Cover Map 1990 (vector, GB)

    • data.europa.eu
    • ckan.publishing.service.gov.uk
    • +3more
    unknown, zip
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Information Data Centre, Land Cover Map 1990 (vector, GB) [Dataset]. https://data.europa.eu/data/datasets/land-cover-map-1990-vector-gb?locale=en
    Explore at:
    zip, unknownAvailable download formats
    Dataset authored and provided by
    Environmental Information Data Centre
    Description

    This dataset consists of the vector version of the Land Cover Map 1990 (LCM1990) for Great Britain. The vector data set is the core LCM data set from which the full range of other LCM1990 products are derived. It provides a number of attributes including land cover at the target class level (given as an integer value and also as text), the number of pixels within the polygon classified as each land cover type and a probability value provided by the classification algorithm (for full details see the LCM1990 Dataset Documentation). The 21 target classes are based on the Joint Nature Conservation Committee (JNCC) Broad Habitats, which encompass the entire range of UK habitats. LCM1990 is a land cover map of the UK which was produced at the UK Centre for Ecology & Hydrology by classifying satellite images (mainly from 1989 and 1990) into 21 Broad Habitat-based classes. It is the first in a series of land cover maps for the UK, which also includes maps for 2000, 2007, 2015, 2017, 2018 and 2019. LCM1990 consists of a range of raster and vector products and users should familiarise themselves with the full range (see related records, the UKCEH web site and the LCM1990 Dataset documentation) to select the product most suited to their needs. This work was supported by the Natural Environment Research Council award number NE/R016429/1 as part of the UK-SCAPE programme delivering National Capability. Full details about this dataset can be found at https://doi.org/10.5285/304a7a40-1388-49f5-b3ac-709129406399

  7. Coffee Shop Daily Revenue Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Coffee Shop Daily Revenue Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/himelsarder/coffee-shop-daily-revenue-prediction-dataset
    Explore at:
    zip(30259 bytes)Available download formats
    Dataset updated
    Feb 7, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Overview

    This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.

    Columns & Variables

    The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.

    1. Number of Customers Per Day

      • The total number of customers visiting the coffee shop on any given day.
      • Range: 50 - 500 customers.
    2. Average Order Value ($)

      • The average dollar amount spent by each customer during their visit.
      • Range: $2.50 - $10.00.
    3. Operating Hours Per Day

      • The total number of hours the coffee shop is open for business each day.
      • Range: 6 - 18 hours.
    4. Number of Employees

      • The number of employees working on a given day. This can influence service speed, customer satisfaction, and ultimately, sales.
      • Range: 2 - 15 employees.
    5. Marketing Spend Per Day ($)

      • The amount of money spent on marketing campaigns or promotions on any given day.
      • Range: $10 - $500 per day.
    6. Location Foot Traffic (people/hour)

      • The number of people passing by the coffee shop per hour, a variable indicative of the shop's location and its potential to attract customers.
      • Range: 50 - 1000 people per hour.

    Target Variable

    • Daily Revenue ($)
      • This is the dependent variable representing the total revenue generated by the coffee shop each day.
      • It is calculated as a combination of customer visits, average spending, and other operational factors like marketing spend and staff availability.
      • Range: $200 - $10,000 per day.

    Data Distribution & Insights

    The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:

    • The effect of marketing spend on daily revenue.
    • The correlation between customer count and daily sales.
    • The relationship between staffing levels and revenue generation.
    • The influence of foot traffic and operating hours on customer behavior.

    Use Cases & Applications

    The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:

    • Predictive Modeling: Use machine learning models such as regression, decision trees, or neural networks to predict daily revenue based on operational data.
    • Business Strategy Development: Analyze how changes in marketing spend, staff numbers, or operating hours can optimize revenue and improve efficiency.
    • Customer Insights: Identify patterns in customer behavior related to shop operations and external factors like foot traffic and marketing campaigns.
    • Resource Allocation: Determine optimal staffing levels and marketing budgets based on predicted sales, improving overall profitability.

    Real-World Applications in the Food & Beverage Industry

    For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:

    • Optimize Marketing Campaigns: Evaluate the effectiveness of daily or seasonal marketing campaigns on revenue.
    • Staff Scheduling: Predict busy days and ensure that the right number of employees are scheduled to maximize efficiency.
    • Revenue Forecasting: Provide accurate revenue projections that can assist with financial planning and decision-making.
    • Operational Efficiency: Discover the most profitable operating hours and adjust business hours accordingly.

    This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.

    Conclusion

    The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...

  8. n

    Data from: Contrasting effects of host or local specialization: widespread...

    • data-staging.niaid.nih.gov
    • ourarchive.otago.ac.nz
    • +3more
    zip
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin (2024). Contrasting effects of host or local specialization: widespread haemosporidians are host generalist whereas local specialists are locally abundant [Dataset]. http://doi.org/10.5061/dryad.j3tx95xfb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    University of Otago
    Universidade Estadual de Campinas (UNICAMP)
    Authors
    Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).

  9. Data from: SGS-LTER Long-term Monitoring Project: Spotlight Rabbit Count on...

    • catalog.data.gov
    • portal.edirepository.org
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). SGS-LTER Long-term Monitoring Project: Spotlight Rabbit Count on the Central Plains Experimental Range, Nunn, Colorado, USA 1994-2006, ARS Study Number 98 [Dataset]. https://catalog.data.gov/dataset/sgs-lter-long-term-monitoring-project-spotlight-rabbit-count-on-the-central-plains-experim-097dc
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    Colorado, Nunn, United States
    Description

    This data package was produced by researchers working on the Shortgrass Steppe Long Term Ecological Research (SGS-LTER) Project, administered at Colorado State University. Long-term datasets and background information (proposals, reports, photographs, etc.) on the SGS-LTER project are contained in a comprehensive project collection within the Digital Collections of Colorado (http://digitool.library.colostate.edu/R/?func=collections&collection_id=3429). The data table and associated metadata document, which is generated in Ecological Metadata Language, may be available through other repositories serving the ecological research community and represent components of the larger SGS-LTER project collection. Additional information and referenced materials can be found: http://hdl.handle.net/10217/83448. Rabbits are the most important small-mammal herbivores in shortgrass steppe, and may significant influence the physiognomy and population dynamics of herbaceous plants and woody shrubs. Rabbits also are the most important prey of mammalian carnivores such as coyotes and large raptors such as golden eagles and great horned owls. Two hares (Lepus californicus, L. townsendii) and one cottontail rabbit (Sylvilagus audubonii) occur in shortgrass steppe. In 1994, we initiated long-term studies to track changes in relative abundance of rabbits on the Central Plains Experimental Range (CPER). On four nights each year (one night each season, usually on new moon nights in January, April, July, October), we drove a 32-km route consisting of pasture two-track and gravel roads on the CPER. This was the same route as that driven for carnivore scat counts. Surveys began at twilight. Observers with two spotlights sat in the back of a 4WD pick-up driven at Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-sgs&identifier=136 Webpage with information and links to data files for download

  10. d

    Major Basin Lines

    • catalog.data.gov
    • data.ct.gov
    • +3more
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy & Environmental Protection (2025). Major Basin Lines [Dataset]. https://catalog.data.gov/dataset/major-basin-lines-f1f41
    Explore at:
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Department of Energy & Environmental Protection
    Description

    See full Data Guide here.Major Drainage Basin Set: Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data. Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data.

  11. w

    Fire statistics data tables

    • gov.uk
    • s3.amazonaws.com
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2025). Fire statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/fire-statistics-data-tables
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    GOV.UK
    Authors
    Ministry of Housing, Communities and Local Government
    Description

    On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.

    This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.

    MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/">Northern Ireland: Fire and Rescue Statistics.

    If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@communities.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Fire statistics guidance
    Fire statistics incident level datasets

    Incidents attended

    https://assets.publishing.service.gov.uk/media/68f0f810e8e4040c38a3cf96/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 143 KB) Previous FIRE0101 tables

    https://assets.publishing.service.gov.uk/media/68f0ffd528f6872f1663ef77/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 2.12 MB) Previous FIRE0102 tables

    https://assets.publishing.service.gov.uk/media/68f20a3e06e6515f7914c71c/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 197 KB) Previous FIRE0103 tables

    https://assets.publishing.service.gov.uk/media/68f20a552f0fc56403a3cfef/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 443 KB) Previous FIRE0104 tables

    Dwelling fires attended

    https://assets.publishing.service.gov.uk/media/68f100492f0fc56403a3cf94/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, 192 KB) Previous FIRE0201 tables

    <span class="gem

  12. Epidote - Sample 1 NanED Round Robin, Data: ESR10

    • data.europa.eu
    • zenodo.org
    unknown
    Updated Sep 21, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). Epidote - Sample 1 NanED Round Robin, Data: ESR10 [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7848921?locale=ga
    Explore at:
    unknown(636156556)Available download formats
    Dataset updated
    Sep 21, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Epidote The following submission contains the data collection of datasets for the sample epidote under the NanEd round-robin project. 5 single crystals were identified, and the continuous rotation data acquisition technique was used to collect datasets. All the datasets were processed with REDp, XDS and shelx software. The table below summarizes the data collection parameters for the datasets. The following data is also in the zip folder as a word file. Continuous Rotation: General information: Project NanED (www.naned.eu) ESR Project ESR10 - Round Robin Project Label RR-1 Sample Label RR-1_SU Data set Label RR1-1 to 5 Instrumental: Instrument Transmission electron microscope JEOL 2100 LaB6 Radiation source LaB6 Accelerating voltage 200 kV Wavelength 0.0251 Å Probe Type Parallel beam Beam Diameter 6 μm Beam Convergence Parallel beam, convergence <0.1mrad Detector Hybrid pixel detector ASI Timepix (bottom mounted) Number of pixels in the image 512 x 512 Pixel size 55 µm x 55 µm Effective camera length 250 mm Calibration constant 0.004990 Å-1/pixel Sample description: Name Epidote Chemical composition Ca2FexAl3-xSi3O13H Sample source Natural source from Val d'Ossola, Italy Sample preparation Powder crushed in an agate mortar and deposited on a Cu grid with lacey C film Experimental: Data Type Electron diffraction data - 3D ED Data collection method Continuous Rotation Temperature (K) used during data collection 293 K Number of crystals contributing to the data set 5 Number of experimental frames See cRED_log.txt in each dataset tilt range, tilt step, tilt per frame See cRED_log.txt in each dataset Exposure time per frame See cRED_log.txt in each dataset Software: Software used for the data collection Instamatic Software used for processing XDS Authorship and bibliography Author(s) of the data Lei Wang (ESR 10) Related data Publication(s) Files and data formats Image folder REDp: Folder containing images of the diffraction pattern from each frame XDS: Folder containing images of the diffraction pattern from each frame Image format mrc and img Additional folders/files Crystal image : image of the crystal cRED_log: log file of data collection 1.ed3d: input file for REDp XDS.INP: input file for the program XDS Notes: *Project Label "RR-1" stands for Round Robin 1 *RR-1 contains 5 individual datasets.

  13. H

    Replication Data for: kluster: An Efficient Scalable Procedure for...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 15, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein Estiri (2018). Replication Data for: kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning [Dataset]. http://doi.org/10.7910/DVN/LLIOHM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Hossein Estiri
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    182 simulated datasets (first set contains small datasets and second set contains large datasets) with different cluster compositions – i.e., different number clusters and separation values – generated using clusterGeneration package in R. Each set of simulation datasets consists of 91 datasets in comma separated values (csv) format (total of 182 csv files) with 3-15 clusters and 0.1 to 0.7 separation values. Separation values can range between (−0.999, 0.999), where a higher separation value indicates cluster structure with more separable clusters. Size of the dataset, number of clusters, and separation value of the clusters in the dataset is printed in file name. size_X_n_Y_sepval_Z.csv: Size of the dataset = X number of clusters in the dataset = Y separation value of the clusters in the dataset = Z

  14. Number of Freshwater Turtle and Crocodilian Species by Freshwater Ecoregion...

    • opendata.rcmrd.org
    Updated Jan 10, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Nature Conservancy (2012). Number of Freshwater Turtle and Crocodilian Species by Freshwater Ecoregion (Data Basin Dataset) [Dataset]. https://opendata.rcmrd.org/content/6680aa0a0fe64386abde7a12e5b31862
    Explore at:
    Dataset updated
    Jan 10, 2012
    Dataset authored and provided by
    The Nature Conservancyhttp://www.nature.org/
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    Description

    Number of turtle and crocodilian species, by freshwater ecoregion.

    We generated the map of freshwater turtle and crocodilian species richness—the number of species present in each ecoregion—from species distribution maps, primarily drawing on the sources listed below.

    Distribution maps for 260 freshwater turtle species were provided by Buhlmann et al. (2007). The original distribution maps represented coarse ranges of where species were thought to be present in the wild; however, they were not exact ranges. Buhlmann et al. compiled data from museum and literature records. They correlated verified locality points with GIS-defined hydrologic unit codes (HUCs) and subsequently created “projected” distribution maps for each species by selecting additional HUCs that were representative of similar habitats, elevations, and physiographic regions as the HUCs with the verified point localities. The amount of information available varied by species, as some species and regions are better studied than others. In addition, many species names, especially in the tropics, actually represent complexes of several turtle species that have not yet been disaggregated.

    In developing our map, when a range overlapped several ecoregions, we counted species as present in all those ecoregions that had part of the range. Some ecoregions with a long and narrow shape may have an overestimation of species in our map given the way the range polygons were drawn. This is particularly true in the Amazonas High Andes ecoregion (312), where the mountain range has been used as a range boundary for hundreds of species.

    For crocodilians, species range maps are from the IUCN-SSC Crocodile Specialist Group and Britton (2007). Species range maps were assessed visually, and species presence was assigned to ecoregions. When a range overlapped several ecoregions, we counted the species as present in all ecoregions of range overlap.

    These data were derived by The Nature Conservancy and World Wildlife Fund, and were displayed in a map published in The Atlas of Global Conservation (Hoekstra et al., University of California Press, 2010). More information at http://nature.org/atlas.

    The following were our primary data sources:

    Britton, A. 2007. Information on crocodilian species distributions. Available at www.flmnh.ufl.edu/cnhc/csl.html.

    Buhlmann, K. A., T. B. Akre, J. B. Iverson, D. Karapatakis, R. A. Mittermeier, A. Georges, G. J. Rhodin, P. P. van Dijk, and J. W. Gibbons. 2007. A global analysis of tortoise and freshwater turtle distributions. Data from the preliminary results of the Global Reptile Assessment. International Union for Conservation of Nature–Species Survival Commission (IUCN-SSC), Conservation International/Center for Applied Biological Science (CI/CABS), and Savannah River Ecology Laboratory, University of Georgia, Aiken, South Carolina, USA.

    International Union for Conservation of Nature (IUCN)–SSC Crocodile Specialist Group Web site. 2008. Available at http://iucncsg.org/ph1/modules/Home/.

    Citation:Hoekstra, J. M., J. L. Molnar, M. Jennings, C. Revenga, M. D. Spalding, T. M. Boucher, J. C. Robertson, T. J. Heibel, with K. Ellison. 2010. The Atlas of Global Conservation: Changes, Challenges, and Opportunities to Make a Difference. Ed. J. L. Molnar. Berkeley: University of California Press. This dataset was used in a scientifically peer-reviewed publication.

    This layer package was loaded using Data Basin.Click here to go to the detail page for this layer package in Data Basin, where you can find out more information, such as full metadata, or use it to create a live web map.

  15. TIGER/Line Shapefile, 2023, County, Livingston County, KY, Address...

    • catalog.data.gov
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2025). TIGER/Line Shapefile, 2023, County, Livingston County, KY, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-county-livingston-county-ky-address-range-feature
    Explore at:
    Dataset updated
    Aug 10, 2025
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    Livingston County, Kentucky
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Feature Shapefile (ADDRFEAT.dbf) contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. The ADDRFEAT shapefile contains a record for each address range to street name combination. Address range associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that the ADDRFEAT shapefile includes all unsuppressed address ranges compared to the All Lines Shapefile (EDGES.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefile contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  16. f

    Overview of data sets and performed phylogenetic analyses

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosenkranz, David; Welch, David B. Mark; Herlyn, Holger; Wey-Fabrizius, Alexandra R.; Ebersberger, Ingo; Rieger, Benjamin; Witek, Alexander; Hankeln, Thomas (2014). Overview of data sets and performed phylogenetic analyses [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001231645
    Explore at:
    Dataset updated
    Feb 10, 2014
    Authors
    Rosenkranz, David; Welch, David B. Mark; Herlyn, Holger; Wey-Fabrizius, Alexandra R.; Ebersberger, Ingo; Rieger, Benjamin; Witek, Alexander; Hankeln, Thomas
    Description

    The phylogenomic data sets mintax4 and mintax8 comprise a broad range of ortholog protein sequences detected using HaMStR and selected using MARE (keeping only genes that are available for at least four or eight taxa, respectively). The most purposive subset (MPS) data set comprises a fraction of the ortholog proteins in the mintax4 data set, that were at least partially covered by at least one representative per syndermatan subgroup plus one non-syndermatan species. Three additional data sets were compiled by excluding ribosomal proteins from the former mentioned protein selections (mintax4_noRPs, mintax8_noRPs, and MPS_noRPs). To account for potential long-branch attraction (LBA) errors, a data set comprising rather slowly evolving genes out of the mintax4 data set was compiled (mintax4_slow). Data set mintax4 was also modified by keeping only one species per syndermatan subgroup (_4Synd) and subsequently diminished by deleting “singletons” and “dingletons” (-DS). For each data set, the numbers of protein sequences (“# proteins”) and amino acid positions (“# aa”) are indicated. Performed analyses are encoded by numbers for the used programs (1 = RAxML, 2 = TreeFinder, 3 = MrBayes, 4 = PhyloBayes) and letters for the used substitution models (A = 8 partitions, B = 7 partitions, C = CAT, D = LG+I+G+F, E = rtREV+I+G+F). For details see text and Supporting Information.

  17. Census Block Group Map

    • schoolsdata2-db440-tea-texas.opendata.arcgis.com
    • tea-texas.hub.arcgis.com
    • +3more
    Updated Jul 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Texas Education Agency (2019). Census Block Group Map [Dataset]. https://schoolsdata2-db440-tea-texas.opendata.arcgis.com/datasets/census-block-group-map
    Explore at:
    Dataset updated
    Jul 15, 2019
    Dataset authored and provided by
    Texas Education Agency
    Area covered
    Description

    The map provide functions for individual to look up locations and the boundaries of Census Block Group numbers by address or Census Block Group Number. The data resources are based on Esri ArcGIS (www.arcgis.com) and Census Block 2010 Data (www.census.gov/). It covers Census Block's demographic information which are population, race, gender, age, and household. The geocoder which used through the Esri ArcGIS may not be able to provide rooftop accuracy since it is that the addresses are in the range dataset instead of the accurate points. The spatial data may haven't been updated to cause error. You can find additional information .You can find additional information on https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?ref=addr&refresh=t#.

  18. d

    Think Data Group powered by Marketing Data Interactive - Healthcare/Medical...

    • datarade.ai
    .csv, .xls
    Updated Aug 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Think Data Group (2025). Think Data Group powered by Marketing Data Interactive - Healthcare/Medical Professionals with NPI numbers - US postal data, emails and phones [Dataset]. https://datarade.ai/data-products/think-data-group-powered-by-mdi-healthcare-medical-professi-think-data-group-df1b
    Explore at:
    .csv, .xlsAvailable download formats
    Dataset updated
    Aug 28, 2025
    Dataset authored and provided by
    Think Data Group
    Area covered
    United States of America
    Description

    Access a comprehensive list of healthcare professionals with our Healthcare/Medical Professionals Masterfile with NPI numbers, featuring medical staff, nurses, specialty practitioners, and administrative leaders, who serve in settings such as hospitals, medical offices, clinics, psychiatric facilities, nursing homes, and home health agencies.

    The NPI is a 10-position, intelligence-free numeric identifier which is a unique denitrification number assigned to healthcare providers in the United States. All healthcare providers, including doctors, dentists, pharmacists, nurses, and other professionals, as well as healthcare organizations like hospitals, labs, and nursing homes, must have an NPI number to perform transactions and communicate with health plans. At MDI, we excel in connecting businesses with healthcare professionals through precise NPI ID targeting. Our expertise, paired with our robust data solutions, ensures that your marketing campaigns are not only highly effective but also fully compliant with industry regulations. Moreover, we offer an extensive range of targeted options to suit your specific needs.

    Available at the office address and at home address. At home, selections include typical consumer demographics, including age, income, presence and age of children, and over 250 lifestyle segments.

    Available for licensing, please inquire.

    Specialties with counts for home address include:

    Acupuncturists 23,130 Audiologists 16,415 Chiropractors 45,684 Dental Assistants 148,309 Dental Hygienists 172,581 Dentists 163,374 Dieticians & Nutritionists 64,403 Emergency Medical Technicians 66,044 Genetic Counselors 520 Marriage and Family Therapists 77,913 Medical Laboratory Personnel 89,395 Medical Technicians 7,505 Mental Health Counselors 119,317 Nurses (RN, LPN, NP, APN, etc...) 8,008,887 Occupational Therapist Assistant 65,192 Occupational Therapists 171,811 Optometrists 35,738 Pharmacists 391,309 Pharmacy Technician 400,256 Physical Therapist Assistant 105,493 Physical Therapists 259,677 Physician and Surgeon 527,078 Physician Assistants 94,697 Podiatrists 9,198 Psychologists 138,737 Respiratory Therapists 201,413 Social Workers 382,799 Speech & Hearing Therapists 54,503 Speech Pathologists 190,900 Veterinarians 77,948

  19. Credit Card Eligibility Data: Determining Factors

    • kaggle.com
    zip
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Sharma (2024). Credit Card Eligibility Data: Determining Factors [Dataset]. https://www.kaggle.com/datasets/rohit265/credit-card-eligibility-data-determining-factors
    Explore at:
    zip(303227 bytes)Available download formats
    Dataset updated
    May 18, 2024
    Authors
    Rohit Sharma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of the Credit Card Eligibility Data: Determining Factors

    The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.

    Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.

    VariableDescription
    IDAn identifier for each individual (customer).
    GenderThe gender of the individual.
    Own_carA binary feature indicating whether the individual owns a car.
    Own_propertyA binary feature indicating whether the individual owns a property.
    Work_phoneA binary feature indicating whether the individual has a work phone.
    PhoneA binary feature indicating whether the individual has a phone.
    EmailA binary feature indicating whether the individual has provided an email address.
    UnemployedA binary feature indicating whether the individual is unemployed.
    Num_childrenThe number of children the individual has.
    Num_familyThe total number of family members.
    Account_lengthThe length of the individual's account with a bank or financial institution.
    Total_incomeThe total income of the individual.
    AgeThe age of the individual.
    Years_employedThe number of years the individual has been employed.
    Income_typeThe type of income (e.g., employed, self-employed, etc.).
    Education_typeThe education level of the individual.
    Family_statusThe family status of the individual.
    Housing_typeThe type of housing the individual lives in.
    Occupation_typeThe type of occupation the individual is engaged in.
    TargetThe target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0).

    Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.

    This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.

  20. Risky Business: Factor Analysis of Survey Data – Assessing the Probability...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cees van der Eijk; Jonathan Rose (2023). Risky Business: Factor Analysis of Survey Data – Assessing the Probability of Incorrect Dimensionalisation [Dataset]. http://doi.org/10.1371/journal.pone.0118900
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Cees van der Eijk; Jonathan Rose
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper undertakes a systematic assessment of the extent to which factor analysis the correct number of latent dimensions (factors) when applied to ordered-categorical survey items (so-called Likert items). We simulate 2400 data sets of uni-dimensional Likert items that vary systematically over a range of conditions such as the underlying population distribution, the number of items, the level of random error, and characteristics of items and item-sets. Each of these datasets is factor analysed in a variety of ways that are frequently used in the extant literature, or that are recommended in current methodological texts. These include exploratory factor retention heuristics such as Kaiser’s criterion, Parallel Analysis and a non-graphical scree test, and (for exploratory and confirmatory analyses) evaluations of model fit. These analyses are conducted on the basis of Pearson and polychoric correlations. We find that, irrespective of the particular mode of analysis, factor analysis applied to ordered-categorical survey data very often leads to over-dimensionalisation. The magnitude of this risk depends on the specific way in which factor analysis is conducted, the number of items, the properties of the set of items, and the underlying population distribution. The paper concludes with a discussion of the consequences of over-dimensionalisation, and a brief mention of alternative modes of analysis that are much less prone to such problems.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF

Large Dataset of Generalization Patterns in the Number Game

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2018
Dataset provided by
Harvard Dataverse
Authors
Eric J. Bigelow; Steven T. Piantadosi
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

Search
Clear search
Close search
Google apps
Main menu