Facebook
TwitterThis dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
## Example questions
Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
Answer: 4
Question: Calculate -841880142.544 + 411127.
Answer: -841469015.544
Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
Answer: 54*a - 30
It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories:
Facebook
TwitterHere is a list that shows the prime number list up to 10000. Source: easycalculation
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterDiscover the ultimate resource for your B2B needs with our meticulously curated dataset, featuring 148MM+ highly relevant US B2B Contact Data records and associated company information.
Very high fill rates for Phone Number, including for Mobile Phone!
This encompasses a diverse range of fields, including Contact Name (First & Last), Work Address, Work Email, Personal Email, Mobile Phone, Direct-Dial Work Phone, Job Title, Job Function, Job Level, LinkedIn URL, Company Name, Domain, Email Domain, HQ Address, Employee Size, Revenue Size, Industry, NAICS and SIC Codes + Descriptions, ensuring you have the most detailed insights for your business endeavors.
Key Features:
Extensive Data Coverage: Access a vast pool of B2B Contact Data records, providing valuable information on where the contacts work now, empowering your sales, marketing, recruiting, and research efforts.
Versatile Applications: Leverage this robust dataset for Sales Prospecting, Lead Generation, Marketing Campaigns, Recruiting initiatives, Identity Resolution, Analytics, Research, and more.
Phone Number Data Inclusion: Benefit from our comprehensive Phone Number Data, ensuring you have direct and effective communication channels. Explore our Phone Number Datasets and Phone Number Databases for an even more enriched experience.
Flexible Pricing Models: Tailor your investment to match your unique business needs, data use-cases, and specific requirements. Choose from targeted lists, CSV enrichment, or licensing our entire database or subsets to seamlessly integrate this data into your products, platform, or service offerings.
Strategic Utilization of B2B Intelligence:
Sales Prospecting: Identify and engage with the right decision-makers to drive your sales initiatives.
Lead Generation: Generate high-quality leads with precise targeting based on specific criteria.
Marketing Campaigns: Amplify your marketing strategies by reaching the right audience with targeted campaigns.
Recruiting: Streamline your recruitment efforts by connecting with qualified candidates.
Identity Resolution: Enhance your data quality and accuracy by resolving identities with our reliable dataset.
Analytics and Research: Fuel your analytics and research endeavors with comprehensive and up-to-date B2B insights.
Access Your Tailored B2B Data Solution:
Reach out to us today to explore flexible pricing options and discover how Salutary Data Company Data, B2B Contact Data, B2B Marketing Data, B2B Email Data, Phone Number Data, Phone Number Datasets, and Phone Number Databases can transform your business strategies. Elevate your decision-making with top-notch B2B intelligence.
Facebook
TwitterThere are several datasets for simple linear regression algorithm. But most of them are random datasets, though there is no problem with that, but I think that it is very important that the data you are working on no matter how small or simple that problem or algorithm is, should be meaningful. Hence, here is a bit sensible data of random numbers and their corresponding log base 10 values. You can use this dataset to practice and play around with Linear Regression Algorithm.
The dataset consists of two CSVs corresponding to the training and testing dataset. The training dataset was created in Google Spreadsheet using the RANDBETWEEN(1,1000) function to generate pseudo-random values. Then, LOG10() function was used to calculate the log base 10 value of each of these numbers. Afterward, these log values were truncated to 6 decimal points using TRUNC() formula. The testing dataset was created in the same way as the train dataset but the range of numbers were between 1001, and 2000.
I would like to thanks Dr. Andrew Ng for creating an amazing beginner-friendly ML course.
I hope this dataset helps Machine Learning beginners and newbies to practice and learn about Linear Regression.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description: A Deep Dive into Prime Gap Distribution and Primorial Harmonics Overview: This dataset offers a comprehensive exploration of prime gap distribution, focusing on the intriguing patterns associated with primorials and their harmonics. Primorials, the product of the first n prime numbers, play a significant role in shaping the landscape of prime gaps. By analyzing the distribution of prime gaps and their relation to primorials, we can gain deeper insights into the fundamental structure of prime numbers. Data Structure: * Power of 2: The base-2 exponent. * Gap Size N: The size of the Nth prime gap following the given power of 2. Key Features: * Primorial Harmonics: The dataset highlights the appearance of prime gaps that are multiples of primorials, suggesting a deeper connection between these numbers and the distribution of primes. * Large Prime Gaps: The dataset includes information on exceptionally large prime gaps, which can provide valuable clues about the underlying structure of the number line. * Prime Number Distribution: The distribution of prime numbers within the specified range is analyzed, revealing patterns and anomalies. Potential Applications: * Number Theory Research: * Investigating the role of primorials in shaping prime gap distribution. * Testing conjectures related to the Riemann Hypothesis and the Twin Prime Conjecture. * Exploring the connection between prime gaps and other mathematical concepts, such as modular arithmetic and number theory functions. * Machine Learning and Data Science: * Training machine learning models to predict prime gap sizes, incorporating primorials as features. * Developing algorithms to identify and analyze primorial-related patterns. * Computational Mathematics: * Benchmarking computational resources and algorithms for prime number generation and factorization. * Developing new algorithms for efficient computation of primorials and their harmonics. How to Use This Dataset: * Data Exploration: * Visualize the distribution of prime gaps, highlighting the occurrence of primorial harmonics. * Analyze the frequency of different gap sizes, focusing on multiples of primorials. * Study the relationship between prime gap size and the corresponding power of 2, considering the influence of primorials. * Machine Learning: * Incorporate features related to primorials and their harmonics into machine learning models. * Experiment with different feature engineering techniques and hyperparameter tuning to improve model performance. * Use the dataset to train models that can predict the occurrence of large prime gaps and other significant patterns. * Number Theory Research: * Use the dataset to formulate and test new conjectures about the distribution of prime gaps and the role of primorials. * Explore the connection between prime gap distribution and other mathematical fields, such as cryptography and coding theory. By leveraging this dataset, researchers can gain a deeper understanding of the intricate patterns and underlying structures that govern the distribution of prime numbers.
Supplement to the Prime Gap Dataset Description Unveiling the Mysteries of Prime Gaps The Prime Gap Dataset offers a unique opportunity to delve into the fascinating world of prime numbers. By analyzing the distribution of gaps between consecutive primes, we can uncover hidden patterns and structures that might hold the key to unlocking the secrets of the universe. Key Features and Potential Insights: * Visual Exploration: Immerse yourself in stunning visualizations of prime gap distributions, revealing hidden patterns and anomalies. * Statistical Analysis: Conduct in-depth statistical analysis to identify trends, correlations, and outliers. * Machine Learning Applications: Employ machine learning techniques to predict prime gap distributions and discover novel insights. * Fractal Analysis: Investigate the potential fractal nature of prime number distributions, revealing self-similarity at different scales. Potential Research Directions: * Uncovering Hidden Patterns: Explore the distribution of prime gaps at various scales to identify emerging patterns and structures. * Predicting Prime Gap Behavior: Develop machine learning models to predict the size and distribution of future prime gaps. * Testing Mathematical Conjectures: Use the dataset to test conjectures related to prime number distribution, such as the Riemann Hypothesis. * Exploring Connections to Other Fields: Investigate the relationship between prime numbers and other mathematical fields, such as chaos theory and information theory. By delving into this rich dataset, you can contribute to the ongoing exploration of one of the most fundamental and enduring mysteries of mathematics.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.
Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.
Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.
Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.
The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary:
python
def unpickle(file):
import cPickle
with open(file, 'rb') as fo:
dict = cPickle.load(fo)
return dict
And a python3 version:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
Loaded in this way, each of the batch files contains a dictionary with the following elements:
data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.
The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.
Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.
There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.
The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...
Facebook
TwitterSuccess.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.
Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.
API Features:
Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.
Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.
Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.
Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.
Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.
Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Context
Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning
Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset
Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.
Enjoy & Keep Learning !!!
Facebook
TwitterBytemine provides access to one of the largest and most accurate US phone number databases available, featuring over 80 million verified mobile numbers. Our data includes both B2C and B2B contacts, enriched with comprehensive personal and professional details that support a wide range of use cases — from sales and marketing outreach to lead enrichment, identity resolution, and platform integration.
Our US Phone Number Data includes:
80 million+ verified US mobile numbers B2C and B2B contacts with name, email, location, and more Work emails and personal emails 57 contact-level data points including job title, company name, seniority, industry, geography, and more
This dataset gives you unmatched access to individuals across the United States, allowing you to connect with professionals and consumers directly through mobile-first campaigns. Whether you're targeting executives, small business owners, or general consumers, Bytemine provides the precision and scale to reach the right audience.
All phone numbers in our database are:
Verified and regularly updated Matched with accurate metadata (name, email, job, etc.) Compliant with data usage policies Sourced through direct licensing from trusted partners including B2B platforms, employment systems, and verified consumer data sources
This data is ideal for:
Cold calling and phone-based outreach SMS marketing and mobile-based campaigns CRM and marketing automation enrichment Identity resolution and contact matching Prospect list building and segmentation B2B and B2C marketing and retargeting App-based user targeting and onboarding Customer data platforms that need verified mobile identifiers
With access to both business and consumer profiles, Bytemine’s US Phone Number Data allows companies to execute highly targeted and personalized campaigns. Each contact is enriched with up to 57 attributes, giving your team deep insight into who the contact is, where they work, and how best to reach them.
Data can be accessed in two flexible ways:
Our API makes it easy to integrate contact data into your existing tools, workflows, or SaaS platform. Whether you're building a lead generation engine, contact enrichment feature, or an internal prospecting tool, Bytemine delivers the clean, structured data needed to power it.
Bytemine’s phone number dataset is trusted by sales teams, marketing agencies, growth hackers, product teams, and data-driven platforms that rely on accurate contact information to engage the right audience.
If you need verified, mobile-first contact data for B2B or B2C outreach, Bytemine delivers the scale, accuracy, and flexibility required to grow your pipeline, enrich your database, and reach your customers directly.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset hosts the image and numerical data analysed and derived in the accompanying Stabbins & Kameda article for the special issue of Progress in Earth and Planetary Science on instrumentation and preparations for the JAXA Martian Moons eXploration (MMX) mission. The paper describes and validates the performance of the Laboratory OROCHI Simulator (LOROS).
OROCHI (Optical RadiOmeter composed of CHromatic Imagers) is a multispectral multi-view imaging system for the JAXA MMX spacecraft, that will image Phobos and Deimos across 8 visible and near-infrared spectral channels with unprecedented spatial resolution, recording data that in synergy with the other instruments of the MMX spacecraft and rover will constrain hypotheses on the origin of the Martian moons.
LOROS is a laboratory simulator of OROCHI, constructed from commercial off-the-shelf parts.
The dataset for the characterisation and validation of LOROS is composed of the following sub-sets:
A. Modulation Transfer Function
B. Expected Reflectance of Carbonaceous Chondrite & Dark Spectralon
C. Radiometric Calibration
D. Dark Spectralon Validation
mtf_results_07122023.csv
mtf_measurements_07122023img, and are averaged over 25 repeat images to minimise random noise, have had dark frames subtracted, and have been converted from 12-bit to 8-bit grayscale images for compatibility with the MTF Mapper software. Modulation Transfer Function (MTF) and Spatial Frequency Response (SFR) diagnostics generated by MTF Mapper are stored in the results directory.mtf_measurements_07122023├── mtf_knifeedge_low_07122023_*n*│ ├── img│ │ ├── 0_850_img_ave.tif│ │ ├── 1_475_img_ave.tif│ │ ├── ...│ ├── results│ │ ├── 0_850_img_ave_annotated.jpg│ │ ├── 0_850_img_ave_edge_mtf_values.txt│ │ ├── 0_850_img_ave_edge_sfr_values.txt│ │ ├── 1_475_img_ave_annotated.jpg│ │ ├── ...├── mtf_knifeedge_low_07122023_*n+1*│ ├── img│ │ ├── ...highres_input.csvloros_observation.csvorochi_observation.csvB_expected_reflectance├── README.md├── highres_input.csv├── loros_observation.csv└── orochi_observation.csvmeasured_sensor_properties.csvexperimentsC_radiometric_calibration├── README.md├── experiments│ ├── F*S5L10│ ├── F*S99L10│ ├── FGS99L2│ └── FGS99L10└── measured_sensor_properties.csvexperiments DirectoriesC_radiometric_calibration├── README.md├── experiments│ ├── F*S5L10│ │ ├── dark_transfer_curve│ │ ├── photo_transfer_curve│ │ └── F*S5L10_derived_properties.csv│ └── ...└── measured_sensor_properties.csv[experiment]_derived_properties.csv) collecting the properties derived from each experiment holds the following information, that has been extracted from the Photon Transfer and Dark Transfer curves as described in §4.2 of the manuscript:camera # The camera number and wavelengthk_adc # Sensitivity (e-/DN)full_well_e # Saturation Capacity (electrons)full_well_dn # Saturation Capacity (Digital Numbers)read_noise_e # Read Noise (electrons)read_noise_dn # Read Noise (Digital Numbers)bias_e # Offset (electrons)bias_dn # Offset (Digital Numbers)dark_current_e # Dark Current (electrons/second)dark_current_dn # Dark Current (Digital Numbers/second)DR # Dynamic Rangelin_min # Minimum Linearity Errorlin_max # Maximum Linearity Errorlinearity # Average Linearity Errorsnr_max # Maximum Signal-to-Noise Ratiot_exp_min # Minimum Exposure used in experiment (seconds)t_exp_max # Maximum Exposure used in experiment (seconds)expected_response # Expected Response (or 'Digital Flux') for OROCHI^12 at Phobos (Digital Numbers/second)response # Fitted Response (or 'Digital Flux') (Digital Numbers/second)dark_transfer_curve directory hosts the derived Dark Transfer Curve data (derived_data) and the source region-of-interest dark image pair data (raw_data) for each LOROS channel.dark_transfer_curve├── derived_data│ ├── F*S5L10_0_850_dtc.csv│ ├── F*S5L10_1_475_dtc.csv│ ├── F*S5L10_2_400_dtc.csv│ ├── F*S5L10_3_550_dtc.csv│ ├── F*S5L10_4_725_dtc.csv│ ├── F*S5L10_5_950_dtc.csv│ ├── F*S5L10_6_650_dtc.csv│ └── F*S5L10_7_550_dtc.csv└── raw_data├── 0_850│ ├── 850_10095570us_1_calibration.tif│ ├── 850_10095570us_2_calibration.tif│ ├── 850_104us_1_calibration.tif│ ├── 850_104us_2_calibration.tif│ ├── ...├── 1_475├── 2_400├── 3_550├──
Facebook
TwitterSee full Data Guide here.Major Drainage Basin Set: Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data. Connecticut Major Drainage Basins is 1:24,000-scale, polygon and line feature data that define Major drainage basin areas in Connecticut. These large basins mostly range from 70 to 2,000 square miles in size. Connecticut Major Drainage Basins includes drainage areas for all Connecticut rivers, streams, brooks, lakes, reservoirs and ponds published on 1:24,000-scale 7.5 minute topographic quadrangle maps prepared by the USGS between 1969 and 1984. Data is compiled at 1:24,000 scale (1 inch = 2,000 feet). This information is not updated. Polygon and line features represent drainage basin areas and boundaries, respectively. Each basin area (polygon) feature is outlined by one or more major basin boundary (line) feature. These data include 10 major basin area (polygon) features and 284 major basin boundary (line) features. Major Basin area (polygon) attributes include major basin number and feature size in acres and square miles. The major basin number (MBAS_NO) uniquely identifies individual basins and is 1 character in length. There are 8 unique major basin numbers. Examples include 1, 4, and 6. Note there are more major basin polygon features (10) than unique major basin numbers (8) because two polygon features are necessary to represent both the entire South East Coast and Hudson Major basins in Connecticut. Major basin boundary (line) attributes include a drainage divide type attribute (DIVIDE) used to cartographically represent the hierarchical drainage basin system. This divide type attribute is used to assign different line symbology to different levels of drainage divides. For example, major basin drainage divides are more pronounced and shown with a wider line symbol than regional basin drainage divides. Connecticut Major Drainage Basin polygon and line feature data are derived from the geometry and attributes of the Connecticut Drainage Basins data.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to page numbers of bibliographic resources, known as manifestations (http://purl.org/spar/fabio/Manifestation). It contains all the bibliographic metadata and its provenance information, structured specifically around manifestations (page numbers), in JSON-LD format.
The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).
After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.
At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.
For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/se.zip
Additional information about OpenCitations Meta at the official webpage.
Facebook
TwitterThe TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Feature Shapefile (ADDRFEAT.dbf) contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. The ADDRFEAT shapefile contains a record for each address range to street name combination. Address range associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that the ADDRFEAT shapefile includes all unsuppressed address ranges compared to the All Lines Shapefile (EDGES.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefile contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
Facebook
TwitterThese data consist of microsatellite genotype scores for all samples of Santa Ana sucker (Catostomus santaanae) used in the study. Scores represent the allele calls for each microsatellite locus (i.e. DNA fragment length containing the microsatellite repeats), with each locus containing two scores representing the two allele copies detected. Included are five tables: Full dataset (includes genotypes from all samples), Santa Clara River samples only (includes genotypes only from samples collected in the Santa Clara River), Convert File format key (explains the data file format), Population identifiers (translates the numerical population identifiers to actual collecting sites), CASA sampling points (one coordinate given for each general collection site). Whole specimens are accessioned at the Los Angeles County Museum of Natural History (catalog numbers 58475-58481). GenBank accession numbers for the DNA sequences are: MF918422 - MF918481. These data support the following publication: Richmond, J.Q., Backlin, A.R., Galst‐Cavalcante, C., O'Brien, J.W. and Fisher, R.N., Loss of dendritic connectivity in southern California's urban riverscape facilitates decline of an endemic freshwater fish. Molecular Ecology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These results are from the rail demo of 5G-PICTURE (www.5g-picture-project.eu). For more details see Deliverable D6.3 where there are also plotted figures. Dataset 4-1 This dataset is generated by a computer model. The modulation and coding scheme (MCS) of a mmWave link between an access point (AP) and a station (STA) mounted on the roof of a train is plotted as a function of the distance between AP and STA. The IEEE 802.11ad single-carrier technology is assumed, and typical conditions when the range is approximately 350 m – in other words the lowest MCS, MCS1 can be supported up to this distance. The MCS takes integer values in range 1 to 12. Dataset 4-2 This dataset is generated by the same computer model as dataset 1. In this case we plot the predicted data rate (at the application layer in Gbps) and SNR (in dB). In the simulation we assume SNR requirements of an ideal AWGN channel and adjust the link budget to align with the typical range observed in the field. The SNR is also capped at a maximum value of 25dB commensurate with a real device. Datasets 4-5 to 4-12 This is a measured dataset from field testing of the Rail Demo. In the field test the train drives from one end of the test network to the other (over approximately 2km). Traffic (TCP iperf3) is generated within each trackside mmWave AP and sent to the train STAs when an association has been established. The datasets include measurement performed by the two STA of a single train node (TN), labelled ‘Train-1’. One STA has a radio facing forwards and one is facing backwards (see deliverable D6.3). These form the two datasets for each parameter. When a STA is not associated (i.e. has no mmWave link) the parameter is not recorded since no data packets are received. The following parameters are captured: Datasets 4-5 and 4-6 The modulation and coding scheme (MCS) of a mmWave link between an AP and each STA is logged. Datasets 4-7 and 4-8 The SNR is logged. SNR is measured in dB. Datasets 4-9 and 4-10 The sector ID here indicates which beam has been chosen by the TN radios when receiving packets. A STA maintains a beambook of 13 directional beams, and a beamforming protocol identify the best beam to use. The Sector ID is an integer from 1 to 13. Low beam numbers are close to boresight, whilst the highest numbers (up to 13) imply beam steering up to 45 degrees away from boresight. Odd numbers represent pointing to the left and even number point to the right. Datasets 4-11 and 4-12 This plots the received data rate by each STA at the application layer (TCP iperf3). Unit Mbps.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By FiveThirtyEight [source]
This dataset contains a comprehensive collection of data utilized by FiveThirtyEight in their articles, graphics, and interactive content pertaining to the predictions for NFL games. FiveThirtyEight uses an ELO algorithm to predict the potential winner of each game. Elo ratings are a measure of team strength based on head-to-head results, margin of victory, and quality of opponent. The ratings are always relative — they only have meaning in comparison to other teams’ ratings.
The database is compiled with a wide array of key metrics that add depth to any analysis or evaluation related to NFL games prediction. Items available range from team names and seasons numbers up till probability forecasts that precisely point out which team won or lost a certain match at any given date.
The columns within this dataset include: - Date: The day on which the game was played. - Season: Specifies during which season the game took place. - Neutral: This indicates whether the game was played at a neutral venue or not. - Playoff: Details if it's a playoff game - Team1 & Team2: Names both participating teams. - Elo1_pre & Elo2_pre: Indicates each team’s Elo rating before the game - Elo_prob1 & Elo_prob2 : Gives out winning probabilities for either team - Result 1 : Reveals who won
This data can be used by sports analysts and enthusiasts alike while making predictions about future matches and uncovering trends hidden beneath past experiences regarding NFL games. The data amassed here serve as tools for individuals who wish to delve into playful soothsaying based on solid statistics or for researchers willing to perform substantial studies encompassing historical figures related with decades' worth of American football.
Do keep in mind as you navigate through this extensive repository that all its contents come under Creative Commons Attribution 4.0 International License while our source codes adhere strictly with MIT License; retainings rights yet promoting productive borrowing for meaningful purposes guided towards creating new compelling outputs.
If you find this data useful in your work or personal projects, we would love to hear about your experiences and how our data repository has contributed to them
This dataset is beneficial for data analysts, data scientists, sports enthusiasts, or anyone who is interested in historical and predictive analysis of NFL games.
Here are some instructions on how to use this dataset:
Understanding the dataset: Before using this dataset, you must understand what each column represents. The information includes game details like team names (team1 and team2), their corresponding Elo ratings before (elo1_before and elo2_before) and after(elo1_after and elo2_after) the game, result of individual games(team1_win_prob) etc.
Predictive Analysis: Develop a Machine Learning model: Use features such as Elo scores before the match to predict match outcomes.It'll be interesting to see how accurate a predictive model can be! For instance - linear regression can be implemented on this kind of problem statement.
Historical Analysis: Analyze patterns from past results by producing descriptive statistics or creating visualizations with libraries such as Matplotlib or Seaborn in Python. Examples can include analyzing trends overtime like changes in ratings post matches for teams that have faced each other multiple times etc.
Testing Hypotheses: If you have any hypotheses about NFL games — perhaps that home field advantage increasingly matters, or certain teams outperform their predicted winning probabilities — you can test them using statistical methods such as A/B testing or regression analysis using pandas library's statistical methods .
Free text analyses are also possible through exploring these rich set of columns provided by FiveThirtyEight's documentation (ie., result,touchdowns)
Remember always check your data—clean it up if necessary—and approach it from different angles! An initial hypothesis may not hold true under scrutiny; but don't be discouraged since all findings are valuable when conducting rigorous research.
In conclusion - You could carry out various types of quantitative analysis based on just the Elo ratings and game results, so this dataset holds a wealth of opportunities for predictive modelling, statistical testing and storytelling through data. Happy Exploring!
- Predicting Future Games: This dataset c...
Facebook
TwitterThis dataset consists of mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models.
## Example questions
Question: Solve -42*r + 27*c = -1167 and 130*r + 4*c = 372 for r.
Answer: 4
Question: Calculate -841880142.544 + 411127.
Answer: -841469015.544
Question: Let x(g) = 9*g + 1. Let q(c) = 2*c + 1. Let f(i) = 3*i - 39. Let w(j) = q(x(j)). Calculate f(w(a)).
Answer: 54*a - 30
It contains 2 million (question, answer) pairs per module, with questions limited to 160 characters in length, and answers to 30 characters in length. Note the training data for each question type is split into "train-easy", "train-medium", and "train-hard". This allows training models via a curriculum. The data can also be mixed together uniformly from these training datasets to obtain the results reported in the paper. Categories: