100+ datasets found
  1. Mathematics Dataset

    • github.com
    • opendatalab.com
    • +1more
    Updated Apr 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
    Explore at:
    Dataset updated
    Apr 3, 2019
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Description

    This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

    ## Example questions

     Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
     Answer: 4
     
     Question: Calculate -841880142.544 + 411127.
     Answer: -841469015.544
     
     Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
     Answer: 54*a - 30
    

    It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

    • algebra (linear equations, polynomial roots, sequences)
    • arithmetic (pairwise operations and mixed expressions, surds)
    • calculus (differentiation)
    • comparison (closest numbers, pairwise comparisons, sorting)
    • measurement (conversion, working with time)
    • numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
    • polynomials (addition, simplification, composition, evaluating, expansion)
    • probability (sampling without replacement)
  2. Number of primes in every 100 numbers up to 10000

    • kaggle.com
    zip
    Updated May 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    In06 Days (2021). Number of primes in every 100 numbers up to 10000 [Dataset]. https://www.kaggle.com/datasets/mathnights/number-of-primes-in-every-100-numbers-up-to-10000
    Explore at:
    zip(668 bytes)Available download formats
    Dataset updated
    May 15, 2021
    Authors
    In06 Days
    Description

    Context

    Here is a list that shows the prime number list up to 10000. Source: easycalculation

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  3. d

    US B2B Phone Number Data | 148MM Phone Numbers, Verified Data

    • datarade.ai
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salutary Data (2024). US B2B Phone Number Data | 148MM Phone Numbers, Verified Data [Dataset]. https://datarade.ai/data-products/salutary-data-b2b-data-phone-number-data-mobile-phone-72-salutary-data
    Explore at:
    .json, .csv, .xls, .txtAvailable download formats
    Dataset updated
    Feb 20, 2024
    Dataset authored and provided by
    Salutary Data
    Area covered
    United States of America
    Description

    Discover the ultimate resource for your B2B needs with our meticulously curated dataset, featuring 148MM+ highly relevant US B2B Contact Data records and associated company information.

    Very high fill rates for Phone Number, including for Mobile Phone!

    This encompasses a diverse range of fields, including Contact Name (First & Last), Work Address, Work Email, Personal Email, Mobile Phone, Direct-Dial Work Phone, Job Title, Job Function, Job Level, LinkedIn URL, Company Name, Domain, Email Domain, HQ Address, Employee Size, Revenue Size, Industry, NAICS and SIC Codes + Descriptions, ensuring you have the most detailed insights for your business endeavors.

    Key Features:

    Extensive Data Coverage: Access a vast pool of B2B Contact Data records, providing valuable information on where the contacts work now, empowering your sales, marketing, recruiting, and research efforts.

    Versatile Applications: Leverage this robust dataset for Sales Prospecting, Lead Generation, Marketing Campaigns, Recruiting initiatives, Identity Resolution, Analytics, Research, and more.

    Phone Number Data Inclusion: Benefit from our comprehensive Phone Number Data, ensuring you have direct and effective communication channels. Explore our Phone Number Datasets and Phone Number Databases for an even more enriched experience.

    Flexible Pricing Models: Tailor your investment to match your unique business needs, data use-cases, and specific requirements. Choose from targeted lists, CSV enrichment, or licensing our entire database or subsets to seamlessly integrate this data into your products, platform, or service offerings.

    Strategic Utilization of B2B Intelligence:

    Sales Prospecting: Identify and engage with the right decision-makers to drive your sales initiatives.

    Lead Generation: Generate high-quality leads with precise targeting based on specific criteria.

    Marketing Campaigns: Amplify your marketing strategies by reaching the right audience with targeted campaigns.

    Recruiting: Streamline your recruitment efforts by connecting with qualified candidates.

    Identity Resolution: Enhance your data quality and accuracy by resolving identities with our reliable dataset.

    Analytics and Research: Fuel your analytics and research endeavors with comprehensive and up-to-date B2B insights.

    Access Your Tailored B2B Data Solution:

    Reach out to us today to explore flexible pricing options and discover how Salutary Data Company Data, B2B Contact Data, B2B Marketing Data, B2B Email Data, Phone Number Data, Phone Number Datasets, and Phone Number Databases can transform your business strategies. Elevate your decision-making with top-notch B2B intelligence.

  4. Linear Regression on Logarithm Data

    • kaggle.com
    zip
    Updated Sep 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogesh Singh (2021). Linear Regression on Logarithm Data [Dataset]. https://www.kaggle.com/datasets/noobyogi0100/linear-regression-on-logarithm-data
    Explore at:
    zip(2668 bytes)Available download formats
    Dataset updated
    Sep 1, 2021
    Authors
    Yogesh Singh
    Description

    Context

    There are several datasets for simple linear regression algorithm. But most of them are random datasets, though there is no problem with that, but I think that it is very important that the data you are working on no matter how small or simple that problem or algorithm is, should be meaningful. Hence, here is a bit sensible data of random numbers and their corresponding log base 10 values. You can use this dataset to practice and play around with Linear Regression Algorithm.

    Content

    The dataset consists of two CSVs corresponding to the training and testing dataset. The training dataset was created in Google Spreadsheet using the RANDBETWEEN(1,1000) function to generate pseudo-random values. Then, LOG10() function was used to calculate the log base 10 value of each of these numbers. Afterward, these log values were truncated to 6 decimal points using TRUNC() formula. The testing dataset was created in the same way as the train dataset but the range of numbers were between 1001, and 2000.

    Acknowledgements

    I would like to thanks Dr. Andrew Ng for creating an amazing beginner-friendly ML course.

    Inspiration

    I hope this dataset helps Machine Learning beginners and newbies to practice and learn about Linear Regression.

  5. Prime Number Source Code with Dataset

    • figshare.com
    zip
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ayman Mostafa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.

  6. Prime gap frequency distribution (powers of 2)

    • kaggle.com
    zip
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erick Magyar (2025). Prime gap frequency distribution (powers of 2) [Dataset]. https://www.kaggle.com/datasets/erickmagyar/prime-gap-frequency-distribution-powers-of-2
    Explore at:
    zip(5860739 bytes)Available download formats
    Dataset updated
    Mar 26, 2025
    Authors
    Erick Magyar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description: A Deep Dive into Prime Gap Distribution and Primorial Harmonics Overview: This dataset offers a comprehensive exploration of prime gap distribution, focusing on the intriguing patterns associated with primorials and their harmonics. Primorials, the product of the first n prime numbers, play a significant role in shaping the landscape of prime gaps. By analyzing the distribution of prime gaps and their relation to primorials, we can gain deeper insights into the fundamental structure of prime numbers. Data Structure: * Power of 2: The base-2 exponent. * Gap Size N: The size of the Nth prime gap following the given power of 2. Key Features: * Primorial Harmonics: The dataset highlights the appearance of prime gaps that are multiples of primorials, suggesting a deeper connection between these numbers and the distribution of primes. * Large Prime Gaps: The dataset includes information on exceptionally large prime gaps, which can provide valuable clues about the underlying structure of the number line. * Prime Number Distribution: The distribution of prime numbers within the specified range is analyzed, revealing patterns and anomalies. Potential Applications: * Number Theory Research: * Investigating the role of primorials in shaping prime gap distribution. * Testing conjectures related to the Riemann Hypothesis and the Twin Prime Conjecture. * Exploring the connection between prime gaps and other mathematical concepts, such as modular arithmetic and number theory functions. * Machine Learning and Data Science: * Training machine learning models to predict prime gap sizes, incorporating primorials as features. * Developing algorithms to identify and analyze primorial-related patterns. * Computational Mathematics: * Benchmarking computational resources and algorithms for prime number generation and factorization. * Developing new algorithms for efficient computation of primorials and their harmonics. How to Use This Dataset: * Data Exploration: * Visualize the distribution of prime gaps, highlighting the occurrence of primorial harmonics. * Analyze the frequency of different gap sizes, focusing on multiples of primorials. * Study the relationship between prime gap size and the corresponding power of 2, considering the influence of primorials. * Machine Learning: * Incorporate features related to primorials and their harmonics into machine learning models. * Experiment with different feature engineering techniques and hyperparameter tuning to improve model performance. * Use the dataset to train models that can predict the occurrence of large prime gaps and other significant patterns. * Number Theory Research: * Use the dataset to formulate and test new conjectures about the distribution of prime gaps and the role of primorials. * Explore the connection between prime gap distribution and other mathematical fields, such as cryptography and coding theory. By leveraging this dataset, researchers can gain a deeper understanding of the intricate patterns and underlying structures that govern the distribution of prime numbers.

    Supplement to the Prime Gap Dataset Description Unveiling the Mysteries of Prime Gaps The Prime Gap Dataset offers a unique opportunity to delve into the fascinating world of prime numbers. By analyzing the distribution of gaps between consecutive primes, we can uncover hidden patterns and structures that might hold the key to unlocking the secrets of the universe. Key Features and Potential Insights: * Visual Exploration: Immerse yourself in stunning visualizations of prime gap distributions, revealing hidden patterns and anomalies. * Statistical Analysis: Conduct in-depth statistical analysis to identify trends, correlations, and outliers. * Machine Learning Applications: Employ machine learning techniques to predict prime gap distributions and discover novel insights. * Fractal Analysis: Investigate the potential fractal nature of prime number distributions, revealing self-similarity at different scales. Potential Research Directions: * Uncovering Hidden Patterns: Explore the distribution of prime gaps at various scales to identify emerging patterns and structures. * Predicting Prime Gap Behavior: Develop machine learning models to predict the size and distribution of future prime gaps. * Testing Mathematical Conjectures: Use the dataset to test conjectures related to prime number distribution, such as the Riemann Hypothesis. * Exploring Connections to Other Fields: Investigate the relationship between prime numbers and other mathematical fields, such as chaos theory and information theory. By delving into this rich dataset, you can contribute to the ongoing exploration of one of the most fundamental and enduring mysteries of mathematics.

  7. H

    Large Dataset of Generalization Patterns in the Number Game

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Eric J. Bigelow; Steven T. Piantadosi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

  8. original : CIFAR 100

    • kaggle.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100
    Explore at:
    zip(168517945 bytes)Available download formats
    Dataset updated
    Dec 28, 2024
    Authors
    Shashwat Pandey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

    The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

    The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

    Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

    Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

    Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

    The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

    The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

    Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

    There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

    The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...

  9. Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

    • datarade.ai
    Updated Oct 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 25, 2024
    Dataset provided by
    Area covered
    United States
    Description

    Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

    Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

    API Features:

    • Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.
    • High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.
    • Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

    Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

    Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

    Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

    Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

    Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

    Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...

  10. N

    South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Range, MI Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/52700142-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for South Range, MI, is 21.5.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for South Range, MI, is 28.6.
    • Total dependency ratio for South Range, MI is 50.1.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for South Range, MI is 3.5.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the South Range for the selected age group is shown in the following column.
    • Population (Female): The female population in the South Range for the selected age group is shown in the following column.
    • Total Population: The total population of the South Range for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here

  11. Kaggle Top Datasets🚀📊

    • kaggle.com
    zip
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Frias (2024). Kaggle Top Datasets🚀📊 [Dataset]. https://www.kaggle.com/datasets/aaronfriasr/kaggle-top-datasets
    Explore at:
    zip(1572305 bytes)Available download formats
    Dataset updated
    Apr 10, 2024
    Authors
    Aaron Frias
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning

    Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset

    Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.

    Enjoy & Keep Learning !!!

  12. d

    US Phone Number Data | 80 Million Mobile Numbers | Contact Data | B2C...

    • datarade.ai
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bytemine (2025). US Phone Number Data | 80 Million Mobile Numbers | Contact Data | B2C Contacts | B2B Contacts | Work Emails | Personal Emails | 57 Contact Data Points [Dataset]. https://datarade.ai/data-products/us-phone-number-data-80-million-mobile-numbers-contact-da-bytemine
    Explore at:
    .json, .xml, .csv, .xls, .txt, .jsonl, .parquetAvailable download formats
    Dataset updated
    Aug 6, 2025
    Dataset authored and provided by
    Bytemine
    Area covered
    United States
    Description

    Bytemine provides access to one of the largest and most accurate US phone number databases available, featuring over 80 million verified mobile numbers. Our data includes both B2C and B2B contacts, enriched with comprehensive personal and professional details that support a wide range of use cases — from sales and marketing outreach to lead enrichment, identity resolution, and platform integration.

    Our US Phone Number Data includes:

    80 million+ verified US mobile numbers B2C and B2B contacts with name, email, location, and more Work emails and personal emails 57 contact-level data points including job title, company name, seniority, industry, geography, and more

    This dataset gives you unmatched access to individuals across the United States, allowing you to connect with professionals and consumers directly through mobile-first campaigns. Whether you're targeting executives, small business owners, or general consumers, Bytemine provides the precision and scale to reach the right audience.

    All phone numbers in our database are:

    Verified and regularly updated Matched with accurate metadata (name, email, job, etc.) Compliant with data usage policies Sourced through direct licensing from trusted partners including B2B platforms, employment systems, and verified consumer data sources

    This data is ideal for:

    Cold calling and phone-based outreach SMS marketing and mobile-based campaigns CRM and marketing automation enrichment Identity resolution and contact matching Prospect list building and segmentation B2B and B2C marketing and retargeting App-based user targeting and onboarding Customer data platforms that need verified mobile identifiers

    With access to both business and consumer profiles, Bytemine’s US Phone Number Data allows companies to execute highly targeted and personalized campaigns. Each contact is enriched with up to 57 attributes, giving your team deep insight into who the contact is, where they work, and how best to reach them.

    Data can be accessed in two flexible ways:

    1. Via our web platform — search, filter, and download targeted lists
    2. Via API — automate contact discovery, data enrichment, or validation at scale

    Our API makes it easy to integrate contact data into your existing tools, workflows, or SaaS platform. Whether you're building a lead generation engine, contact enrichment feature, or an internal prospecting tool, Bytemine delivers the clean, structured data needed to power it.

    Bytemine’s phone number dataset is trusted by sales teams, marketing agencies, growth hackers, product teams, and data-driven platforms that rely on accurate contact information to engage the right audience.

    If you need verified, mobile-first contact data for B2B or B2C outreach, Bytemine delivers the scale, accuracy, and flexibility required to grow your pipeline, enrich your database, and reach your customers directly.

  13. Dataset for "LOROS: Laboratory Simulations of the Optical RadiOmeter...

    • zenodo.org
    zip
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger Stabbins; Roger Stabbins; Shingo Kameda; Shingo Kameda (2024). Dataset for "LOROS: Laboratory Simulations of the Optical RadiOmeter composed of CHromatic Imagers (OROCHI) Experiment of the Martian Moons eXploration (MMX) Mission" [Dataset]. http://doi.org/10.5281/zenodo.14028648
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Roger Stabbins; Roger Stabbins; Shingo Kameda; Shingo Kameda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset hosts the image and numerical data analysed and derived in the accompanying Stabbins & Kameda article for the special issue of Progress in Earth and Planetary Science on instrumentation and preparations for the JAXA Martian Moons eXploration (MMX) mission. The paper describes and validates the performance of the Laboratory OROCHI Simulator (LOROS).

    OROCHI (Optical RadiOmeter composed of CHromatic Imagers) is a multispectral multi-view imaging system for the JAXA MMX spacecraft, that will image Phobos and Deimos across 8 visible and near-infrared spectral channels with unprecedented spatial resolution, recording data that in synergy with the other instruments of the MMX spacecraft and rover will constrain hypotheses on the origin of the Martian moons.

    LOROS is a laboratory simulator of OROCHI, constructed from commercial off-the-shelf parts.

    The dataset for the characterisation and validation of LOROS is composed of the following sub-sets:

    A. Modulation Transfer Function
    B. Expected Reflectance of Carbonaceous Chondrite & Dark Spectralon
    C. Radiometric Calibration
    D. Dark Spectralon Validation

    Dataset A: Modulation Transfer Function

    This dataset includes the table of results of MTF measurements of the slant-edge target at 5 different random orientations in the range of ~7--10°:
    - mtf_results_07122023.csv


    and the region-of-interest images, for each orientation and each LOROS channel, used to perform the analysis via the MTF Mapper software:
    - mtf_measurements_07122023

    The directory tree of measurements, for the nth orientation, is illustrated below. Region-of-interest images are stored under img, and are averaged over 25 repeat images to minimise random noise, have had dark frames subtracted, and have been converted from 12-bit to 8-bit grayscale images for compatibility with the MTF Mapper software. Modulation Transfer Function (MTF) and Spatial Frequency Response (SFR) diagnostics generated by MTF Mapper are stored in the results directory.
    mtf_measurements_07122023
    ├── mtf_knifeedge_low_07122023_*n*
    │ ├── img
    │ │ ├── 0_850_img_ave.tif
    │ │ ├── 1_475_img_ave.tif
    │ │ ├── ...
    │ ├── results
    │ │ ├── 0_850_img_ave_annotated.jpg
    │ │ ├── 0_850_img_ave_edge_mtf_values.txt
    │ │ ├── 0_850_img_ave_edge_sfr_values.txt
    │ │ ├── 1_475_img_ave_annotated.jpg
    │ │ ├── ...
    ├── mtf_knifeedge_low_07122023_*n+1*
    │ ├── img
    │ │ ├── ...
    This data constitutes part of Table 1 and Figure 2 of the manuscript.

    Dataset B: Expected Reflectance of Carbonaceous Chondrite & Dark Spectralon

    This dataset includes the high-resolution ($\delta\lambda$=1 nm) reference reflectance spectra of the representative Carbonaceous Chondrite meteorite (Nogoya) and the 5% reflectance Spectralon calibration target (SCT5):
    - highres_input.csv

    and the resampled spectra of these materials expected for OROCHI and LOROS filter wavelengths:

    - loros_observation.csv
    - orochi_observation.csv

    B_expected_reflectance
    ├── README.md
    ├── highres_input.csv
    ├── loros_observation.csv
    └── orochi_observation.csv

    This data constitutes Table 1 and Figure 10 of the manuscript.

    Dataset C: Radiometric Calibration

    This dataset contains the image and derived data for 4 experiments with different illumination conditions for characterising the radiometric response of each of the 8 channels of LOROS.

    This dataset contributes to Tables 2 - 4 and Figures 3 - 9 of the manuscript.

    The final derived metrics are hosted in the spreadsheet:

    - measured_sensor_properties.csv

    and image data and intermediary derived properties for each experiment are stored in the

    - experiments

    directory.

    C_radiometric_calibration
    ├── README.md
    ├── experiments
    │ ├── F*S5L10
    │ ├── F*S99L10
    │ ├── FGS99L2
    │ └── FGS99L10
    └── measured_sensor_properties.csv

    experiments Directories

    In the directory of each experiment are sub-directories hosting Photon Transfer and Dark Transfer datasets, and a spreadsheet of derived metrics of these.
    C_radiometric_calibration
    ├── README.md
    ├── experiments
    │ ├── F*S5L10
    │ │ ├── dark_transfer_curve
    │ │ ├── photo_transfer_curve
    │ │ └── F*S5L10_derived_properties.csv
    │ └── ...
    └── measured_sensor_properties.csv
    Derived Properties
    The spreadsheet ([experiment]_derived_properties.csv) collecting the properties derived from each experiment holds the following information, that has been extracted from the Photon Transfer and Dark Transfer curves as described in §4.2 of the manuscript:

    camera # The camera number and wavelength
    k_adc # Sensitivity (e-/DN)
    full_well_e # Saturation Capacity (electrons)
    full_well_dn # Saturation Capacity (Digital Numbers)
    read_noise_e # Read Noise (electrons)
    read_noise_dn # Read Noise (Digital Numbers)
    bias_e # Offset (electrons)
    bias_dn # Offset (Digital Numbers)
    dark_current_e # Dark Current (electrons/second)
    dark_current_dn # Dark Current (Digital Numbers/second)
    DR # Dynamic Range
    lin_min # Minimum Linearity Error
    lin_max # Maximum Linearity Error
    linearity # Average Linearity Error
    snr_max # Maximum Signal-to-Noise Ratio
    t_exp_min # Minimum Exposure used in experiment (seconds)
    t_exp_max # Maximum Exposure used in experiment (seconds)
    expected_response # Expected Response (or 'Digital Flux') for OROCHI^12 at Phobos (Digital Numbers/second)
    response # Fitted Response (or 'Digital Flux') (Digital Numbers/second)

    These values are given for each channel of LOROS, as well as the expected values for LOROS in off-the-shelf configuration (with no gain adjustment), LOROS with the gain adjustment, and OROCHI if downsampled to 12-bit resolution digital numbers.

    This data constitutes Table 2 of the manuscript.

    Dark Transfer Curve

    The dark_transfer_curve directory hosts the derived Dark Transfer Curve data (derived_data) and the source region-of-interest dark image pair data (raw_data) for each LOROS channel.

    dark_transfer_curve
    ├── derived_data
    │ ├── F*S5L10_0_850_dtc.csv
    │ ├── F*S5L10_1_475_dtc.csv
    │ ├── F*S5L10_2_400_dtc.csv
    │ ├── F*S5L10_3_550_dtc.csv
    │ ├── F*S5L10_4_725_dtc.csv
    │ ├── F*S5L10_5_950_dtc.csv
    │ ├── F*S5L10_6_650_dtc.csv
    │ └── F*S5L10_7_550_dtc.csv
    └── raw_data
    ├── 0_850
    │ ├── 850_10095570us_1_calibration.tif
    │ ├── 850_10095570us_2_calibration.tif
    │ ├── 850_104us_1_calibration.tif
    │ ├── 850_104us_2_calibration.tif
    │ ├── ...
    ├── 1_475
    ├── 2_400
    ├── 3_550
    ├──

  14. d

    Major Basin Lines

    • catalog.data.gov
    • data.ct.gov
    • +3more
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Energy & Environmental Protection (2025). Major Basin Lines [Dataset]. https://catalog.data.gov/dataset/major-basin-lines-f1f41
    Explore at:
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Department of Energy & Environmental Protection
    Description

    See full Data Guide here.Major Drainage Basin Set: Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data. Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data.

  15. n

    Data from: Contrasting effects of host or local specialization: widespread...

    • data-staging.niaid.nih.gov
    • ourarchive.otago.ac.nz
    • +3more
    zip
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin (2024). Contrasting effects of host or local specialization: widespread haemosporidians are host generalist whereas local specialists are locally abundant [Dataset]. http://doi.org/10.5061/dryad.j3tx95xfb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    University of Otago
    Universidade Estadual de Campinas (UNICAMP)
    Authors
    Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).

  16. OpenCitations Meta RDF dataset of page numbers metadata and its provenance...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations (2024). OpenCitations Meta RDF dataset of page numbers metadata and its provenance information [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10936231
    Explore at:
    Dataset updated
    Apr 6, 2024
    Dataset authored and provided by
    OpenCitationshttps://opencitations.net/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to page numbers of bibliographic resources, known as manifestations (http://purl.org/spar/fabio/Manifestation). It contains all the bibliographic metadata and its provenance information, structured specifically around manifestations (page numbers), in JSON-LD format.

    The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

    After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

    At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

    For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/se.zip

    Additional information about OpenCitations Meta at the official webpage.

  17. TIGER/Line Shapefile, 2023, County, Livingston County, KY, Address...

    • catalog.data.gov
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2025). TIGER/Line Shapefile, 2023, County, Livingston County, KY, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-county-livingston-county-ky-address-range-feature
    Explore at:
    Dataset updated
    Aug 10, 2025
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    Livingston County, Kentucky
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Feature Shapefile (ADDRFEAT.dbf) contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. The ADDRFEAT shapefile contains a record for each address range to street name combination. Address range associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that the ADDRFEAT shapefile includes all unsuppressed address ranges compared to the All Lines Shapefile (EDGES.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefile contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  18. d

    Data from: Microsatellite genotype scores for a contemporary, range-wide...

    • catalog.data.gov
    • data.cnra.ca.gov
    • +2more
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Microsatellite genotype scores for a contemporary, range-wide sample of Santa Ana sucker in southern California [Dataset]. https://catalog.data.gov/dataset/microsatellite-genotype-scores-for-a-contemporary-range-wide-sample-of-santa-ana-sucker-in
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Southern California, California
    Description

    These data consist of microsatellite genotype scores for all samples of Santa Ana sucker (Catostomus santaanae) used in the study. Scores represent the allele calls for each microsatellite locus (i.e. DNA fragment length containing the microsatellite repeats), with each locus containing two scores representing the two allele copies detected. Included are five tables: Full dataset (includes genotypes from all samples), Santa Clara River samples only (includes genotypes only from samples collected in the Santa Clara River), Convert File format key (explains the data file format), Population identifiers (translates the numerical population identifiers to actual collecting sites), CASA sampling points (one coordinate given for each general collection site). Whole specimens are accessioned at the Los Angeles County Museum of Natural History (catalog numbers 58475-58481). GenBank accession numbers for the DNA sequences are: MF918422 - MF918481. These data support the following publication: Richmond, J.Q., Backlin, A.R., Galst‐Cavalcante, C., O'Brien, J.W. and Fisher, R.N., Loss of dendritic connectivity in southern California's urban riverscape facilitates decline of an endemic freshwater fish. Molecular Ecology.

  19. Datasets 4-1 4-2 4-5 to 4-12,

    • data.europa.eu
    unknown
    Updated Oct 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2020). Datasets 4-1 4-2 4-5 to 4-12, [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4068487?locale=lv
    Explore at:
    unknown(9499)Available download formats
    Dataset updated
    Oct 7, 2020
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These results are from the rail demo of 5G-PICTURE (www.5g-picture-project.eu). For more details see Deliverable D6.3 where there are also plotted figures. Dataset 4-1 This dataset is generated by a computer model. The modulation and coding scheme (MCS) of a mmWave link between an access point (AP) and a station (STA) mounted on the roof of a train is plotted as a function of the distance between AP and STA. The IEEE 802.11ad single-carrier technology is assumed, and typical conditions when the range is approximately 350 m – in other words the lowest MCS, MCS1 can be supported up to this distance. The MCS takes integer values in range 1 to 12. Dataset 4-2 This dataset is generated by the same computer model as dataset 1. In this case we plot the predicted data rate (at the application layer in Gbps) and SNR (in dB). In the simulation we assume SNR requirements of an ideal AWGN channel and adjust the link budget to align with the typical range observed in the field. The SNR is also capped at a maximum value of 25dB commensurate with a real device. Datasets 4-5 to 4-12 This is a measured dataset from field testing of the Rail Demo. In the field test the train drives from one end of the test network to the other (over approximately 2km). Traffic (TCP iperf3) is generated within each trackside mmWave AP and sent to the train STAs when an association has been established. The datasets include measurement performed by the two STA of a single train node (TN), labelled ‘Train-1’. One STA has a radio facing forwards and one is facing backwards (see deliverable D6.3). These form the two datasets for each parameter. When a STA is not associated (i.e. has no mmWave link) the parameter is not recorded since no data packets are received. The following parameters are captured: Datasets 4-5 and 4-6 The modulation and coding scheme (MCS) of a mmWave link between an AP and each STA is logged. Datasets 4-7 and 4-8 The SNR is logged. SNR is measured in dB. Datasets 4-9 and 4-10 The sector ID here indicates which beam has been chosen by the TN radios when receiving packets. A STA maintains a beambook of 13 directional beams, and a beamforming protocol identify the best beam to use. The Sector ID is an integer from 1 to 13. Low beam numbers are close to boresight, whilst the highest numbers (up to 13) imply beam steering up to 45 degrees away from boresight. Odd numbers represent pointing to the left and even number point to the right. Datasets 4-11 and 4-12 This plots the received data rate by each STA at the application layer (TCP iperf3). Unit Mbps.

  20. NFL Elo Game Predictions Dataset

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). NFL Elo Game Predictions Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/nfl-elo-game-predictions-dataset
    Explore at:
    zip(423448 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    NFL Elo Game Predictions Dataset

    FiveThirtyEight's NFL Game Predictions and Historical Data

    By FiveThirtyEight [source]

    About this dataset

    This dataset contains a comprehensive collection of data utilized by FiveThirtyEight in their articles, graphics, and interactive content pertaining to the predictions for NFL games. FiveThirtyEight uses an ELO algorithm to predict the potential winner of each game. Elo ratings are a measure of team strength based on head-to-head results, margin of victory, and quality of opponent. The ratings are always relative — they only have meaning in comparison to other teams’ ratings.

    The database is compiled with a wide array of key metrics that add depth to any analysis or evaluation related to NFL games prediction. Items available range from team names and seasons numbers up till probability forecasts that precisely point out which team won or lost a certain match at any given date.

    The columns within this dataset include: - Date: The day on which the game was played. - Season: Specifies during which season the game took place. - Neutral: This indicates whether the game was played at a neutral venue or not. - Playoff: Details if it's a playoff game - Team1 & Team2: Names both participating teams. - Elo1_pre & Elo2_pre: Indicates each team’s Elo rating before the game - Elo_prob1 & Elo_prob2 : Gives out winning probabilities for either team - Result 1 : Reveals who won

    This data can be used by sports analysts and enthusiasts alike while making predictions about future matches and uncovering trends hidden beneath past experiences regarding NFL games. The data amassed here serve as tools for individuals who wish to delve into playful soothsaying based on solid statistics or for researchers willing to perform substantial studies encompassing historical figures related with decades' worth of American football.

    Do keep in mind as you navigate through this extensive repository that all its contents come under Creative Commons Attribution 4.0 International License while our source codes adhere strictly with MIT License; retainings rights yet promoting productive borrowing for meaningful purposes guided towards creating new compelling outputs.

    If you find this data useful in your work or personal projects, we would love to hear about your experiences and how our data repository has contributed to them

    How to use the dataset

    This dataset is beneficial for data analysts, data scientists, sports enthusiasts, or anyone who is interested in historical and predictive analysis of NFL games.

    Here are some instructions on how to use this dataset:

    • Understanding the dataset: Before using this dataset, you must understand what each column represents. The information includes game details like team names (team1 and team2), their corresponding Elo ratings before (elo1_before and elo2_before) and after(elo1_after and elo2_after) the game, result of individual games(team1_win_prob) etc.

    • Predictive Analysis: Develop a Machine Learning model: Use features such as Elo scores before the match to predict match outcomes.It'll be interesting to see how accurate a predictive model can be! For instance - linear regression can be implemented on this kind of problem statement.

    • Historical Analysis: Analyze patterns from past results by producing descriptive statistics or creating visualizations with libraries such as Matplotlib or Seaborn in Python. Examples can include analyzing trends overtime like changes in ratings post matches for teams that have faced each other multiple times etc.

    • Testing Hypotheses: If you have any hypotheses about NFL games — perhaps that home field advantage increasingly matters, or certain teams outperform their predicted winning probabilities — you can test them using statistical methods such as A/B testing or regression analysis using pandas library's statistical methods .

    • Free text analyses are also possible through exploring these rich set of columns provided by FiveThirtyEight's documentation (ie., result,touchdowns)

    Remember always check your data—clean it up if necessary—and approach it from different angles! An initial hypothesis may not hold true under scrutiny; but don't be discouraged since all findings are valuable when conducting rigorous research.

    In conclusion - You could carry out various types of quantitative analysis based on just the Elo ratings and game results, so this dataset holds a wealth of opportunities for predictive modelling, statistical testing and storytelling through data. Happy Exploring!

    Research Ideas

    • Predicting Future Games: This dataset c...
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DeepMind (2019). Mathematics Dataset [Dataset]. https://github.com/Wikidepia/mathematics_dataset_id
Organization logo

Mathematics Dataset

Related Article
Explore at:
Dataset updated
Apr 3, 2019
Dataset provided by
DeepMindhttp://deepmind.com/
Description

This dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.

## Example questions

 Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
 Answer: 4
 
 Question: Calculate -841880142.544 + 411127.
 Answer: -841469015.544
 
 Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
 Answer: 54*a - 30

It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:

  • algebra (linear equations, polynomial roots, sequences)
  • arithmetic (pairwise operations and mixed expressions, surds)
  • calculus (differentiation)
  • comparison (closest numbers, pairwise comparisons, sorting)
  • measurement (conversion, working with time)
  • numbers (base conversion, remainders, common divisors and multiples, primality, place value, rounding numbers)
  • polynomials (addition, simplification, composition, evaluating, expansion)
  • probability (sampling without replacement)
Search
Clear search
Close search
Google apps
Main menu