Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).
Dataset Structure:
The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:
train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).
test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.
Content of the Data:
Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).
How to Use This Dataset:
MNIST folder (or the archive containing it) and extract its contents.mnist_png folder.train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.train folder provides the images you can use to train your machine learning models.test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.Citation:
The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.
Data Contribution:
Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.
If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1_a.json: 1000 random numbers over the range 0-100 1_b.json: new numbers from the original 1000 numbers in 1_a.json using the equation y=3x+6 the result generated by these two datasets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimates of estuarine Richardson numbers at different freshwater discharge (q) and tidal range (tr) at Hobsonville (RiEH) and Stanley Bay (RiEN)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
dataset_a.json: 1000 random numbers over the range 0-100dataset_b.json: new numbers from the original 1000 numbers in 1_a.json using the equation y=3x+6results.png: generated by these two datasets
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.
Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.
Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.
Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.
The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary:
python
def unpickle(file):
import cPickle
with open(file, 'rb') as fo:
dict = cPickle.load(fo)
return dict
And a python3 version:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
Loaded in this way, each of the batch files contains a dictionary with the following elements:
data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.
The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.
Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.
There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.
The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.
Key observations
Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
Facebook
TwitterThe TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Relationship File (ADDR.dbf) contains the attributes of each address range. Each address range applies to a single edge and has a unique address range identifier (ARID) value. The edge to which an address range applies can be determined by linking the address range to the All Lines Shapefile (EDGES.shp) using the permanent topological edge identifier (TLID) attribute. Multiple address ranges can apply to the same edge since an edge can have multiple address ranges. Note that the most inclusive address range associated with each side of a street edge already appears in the All Lines Shapefile (EDGES.shp). The TIGER/Line Files contain potential address ranges, not individual addresses. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
Facebook
TwitterFile name definitions:'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s'...v_175_250...' - dataset for velocity range [175, 250] m/s'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?input values in 'IN' sheet target values in 'TARGET' sheet Where to find the results from the best ANN model (for each target/output variable and each velocity range)?open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet Check reference below (to be added when the paper is published)https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Facebook
TwitterThe dataset comprises developer test results of Maven projects with flaky tests across a range of consecutive commits from the projects' git commit histories. The Maven projects are a subset of those investigated in an OOPSLA 2020 paper. The commit range for this dataset has been chosen as the flakiness-introducing commit (FIC) and iDFlakies-commit (see the OOPSLA paper for details). The commit hashes have been obtained from the IDoFT dataset.
The dataset will be presented at the 1st International Flaky Tests Workshop 2024 (FTW 2024). Please refer to our extended abstract for more details about the motivation for and context of this dataset.
The following table provides a summary of the data.
Slug (Module) FIC Hash Tests Commits Av. Commits/Test Flaky Tests Tests w/ Consistent Failures Total Distinct Histories
TooTallNate/Java-WebSocket 822d40 146 75 75 24 1 2.6x10^9
apereo/java-cas-client (cas-client-core) 5e3655 157 65 61.7 3 2 1.0x10^7
eclipse-ee4j/tyrus (tests/e2e/standard-config) ce3b8c 185 16 16 12 0 261
feroult/yawp (yawp-testing/yawp-testing-appengine) abae17 1 191 191 1 1 8
fluent/fluent-logger-java 5fd463 19 131 105.6 11 2 8.0x10^32
fluent/fluent-logger-java 87e957 19 160 122.4 11 3 2.1x10^31
javadelight/delight-nashorn-sandbox d0d651 81 113 100.6 2 5 4.2x10^10
javadelight/delight-nashorn-sandbox d19eee 81 93 83.5 1 5 2.6x10^9
sonatype-nexus-community/nexus-repository-helm 5517c8 18 32 32 0 0 18
spotify/helios (helios-services) 23260 190 448 448 0 37 190
spotify/helios (helios-testing) 78a864 43 474 474 0 7 43
The columns are composed of the following variables:
Slug (Module): The project's GitHub slug (i.e., the project's URL is https://github.com/{Slug}) and, if specified, the module for which tests have been executed.
FIC Hash: The flakiness-introducing commit hash for a known flaky test as described in this OOPSLA 2020 paper. As different flaky tests have different FIC hashes, there may be multiple rows for the same slug/module with different FIC hashes.
Tests: The number of distinct test class and method combinations over the entire considered commit range.
Commits: The number of commits in the considered commit range
Av. Commits/Test: The average number of commits per test class and method combination in the considered commit range. The number of commits may vary for each test class, as some tests may be added or removed within the considered commit range.
Flaky Tests: The number of distinct test class and method combinations that have more than one test result (passed/skipped/error/failure + exception type, if any + assertion message, if any) across 30 repeated test suite executions on at least one commit in the considered commit range.
Tests w/ Consistent Failures: The number of distinct test class and method combinations that have the same error or failure result (error/failure + exception type, if any + assertion message, if any) across all 30 repeated test suite executions on at least one commit in the considered commit range.
Total Distinct Histories: The number of distinct test results (passed/skipped/error/failure + exception type, if any + assertion message, if any) for all test class and method combinations along all commits for that test in the considered commit range.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Synthetic Sorting Dataset
This dataset is generated by randomly shuffling the numbers according to the predefined probability distribution of numbers. The task is to sort the numbers in ascending order. I used the script from deep-spin/infinite-former to generate the dataset with longer sequence lengths and larger examples. Total 21 number of tokens are used in the dataset. The symbols are in the range of 0 to 19. The last "20" token is special and means the end of sequence. Use… See the full description on the dataset page: https://huggingface.co/datasets/cosmoquester/synthetic-sorting.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Pencil and pen on linen backed paper, medium in detail, fair condition. - Observation measure: mainly observation. - Map size: 680 mm x 1250 mm. Notes: Chandler's topographic map used in compilation of B.L. Wood's Geological Map Of New Zealand 1:250 000 - Sheet 22 - Wakatipu - 1962. Southern part of map is heavily annotated with field numbers. Northern part of map annotated with structural measurements and geological notes. No date for annotation, but before publication of Sheet 22 in 1962 - possibly by B.L. Wood. Keywords: OTAGO; REES VALLEY; RICHARDSON MOUNTAINS; SHOTOVER RIVER; ARROW RIVER; TOPOGRAPHIC MAPS; GEOLOGIC MAPS; STRUCTURE; MEASUREMENT; FAULTING
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the Wallhack1.8k dataset for WiFi-based long-range activity recognition in Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS)/Through-Wall scenarios, as proposed in [1,2], as well as the CAD models (of 3D-printable parts) of the WiFi systems proposed in [2].
PyTroch Dataloader
A minimal PyTorch dataloader for the Wallhack1.8k dataset is provided at: https://github.com/StrohmayerJ/wallhack1.8k
Dataset Description
The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).
To assess cross-scenario and cross-system generalization, WiFi packet sequences were collected in LoS and through-wall (NLoS) scenarios, utilizing two different WiFi systems (BQ: biquad antenna and PIFA: printed inverted-F antenna). The dataset is structured accordingly:
LOS/BQ/ <- WiFi packets collected in the LoS scenario using the BQ system
LOS/PIFA/ <- WiFi packets collected in the LoS scenario using the PIFA system
NLOS/BQ/ <- WiFi packets collected in the NLoS scenario using the BQ system
NLOS/PIFA/ <- WiFi packets collected in the NLoS scenario using the PIFA system
These directories contain the raw WiFi packet time series (see Table 1). Each row represents a single WiFi packet with the complex CSI vector H being stored in the "data" field and the class label being stored in the "class" field. H is of the form [I, R, I, R, ..., I, R], where two consecutive entries represent imaginary and real parts of complex numbers (the Channel Frequency Responses of subcarriers). Taking the absolute value of H (e.g., via numpy.abs(H)) yields the subcarrier amplitudes A.
To extract the 52 L-LTF subcarriers used in [1], the following indices of A are to be selected:
csi_valid_subcarrier_index = [] csi_valid_subcarrier_index += [i for i in range(6, 32)] csi_valid_subcarrier_index += [i for i in range(33, 59)]
Additional 56 HT-LTF subcarriers can be selected via:
csi_valid_subcarrier_index += [i for i in range(66, 94)]
csi_valid_subcarrier_index += [i for i in range(95, 123)]
For more details on subcarrier selection, see ESP-IDF (Section Wi-Fi Channel State Information) and esp-csi.
Extracted amplitude spectrograms with the corresponding label files of the train/validation/test split: "trainLabels.csv," "validationLabels.csv," and "testLabels.csv," can be found in the spectrograms/ directory.
The columns in the label files correspond to the following: [Spectrogram index, Class label, Room label]
Spectrogram index: [0, ..., n]
Class label: [0,1,2], where 0 = "no presence", 1 = "walking", and 2 = "walking + arm-waving."
Room label: [0,1,2,3,4,5], where labels 1-5 correspond to the room number in the NLoS scenario (see Fig. 3 in [1]). The label 0 corresponds to no room and is used for the "no presence" class.
Dataset Overview:
Table 1: Raw WiFi packet sequences.
Scenario System "no presence" / label 0 "walking" / label 1 "walking + arm-waving" / label 2 Total
LoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
LoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
NLoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
NLoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
4 20 20 44
Table 2: Sample/Spectrogram distribution across activity classes in Wallhack1.8k.
Scenario System
"no presence" / label 0
"walking" / label 1
"walking + arm-waving" / label 2 Total
LoS BQ 149 154 155
LoS PIFA 149 160 152
NLoS BQ 148 150 152
NLoS PIFA 143 147 147
589 611 606 1,806
Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to one of our papers [1,2].
[1] Strohmayer, Julian, and Martin Kampel. (2024). “Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition”, In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 42-56). Cham: Springer Nature Switzerland, doi: https://doi.org/10.1007/978-3-031-63211-2_4.
[2] Strohmayer, Julian, and Martin Kampel., “Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition,” 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 3594-3599, doi: https://doi.org/10.1109/ICIP51287.2024.10647666.
BibTeX citations:
@inproceedings{strohmayer2024data, title={Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition}, author={Strohmayer, Julian and Kampel, Martin}, booktitle={IFIP International Conference on Artificial Intelligence Applications and Innovations}, pages={42--56}, year={2024}, organization={Springer}}@INPROCEEDINGS{10647666, author={Strohmayer, Julian and Kampel, Martin}, booktitle={2024 IEEE International Conference on Image Processing (ICIP)}, title={Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition}, year={2024}, volume={}, number={}, pages={3594-3599}, keywords={Visualization;Accuracy;System performance;Directional antennas;Directive antennas;Reflector antennas;Sensors;Human Activity Recognition;WiFi;Channel State Information;Through-Wall Sensing;ESP32}, doi={10.1109/ICIP51287.2024.10647666}}
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Our datasets, both released and evaluation set, are derived from the YFCC100M Dataset. Each dataset comprises vectors encoded from images using the CLIP model, which are then reduced to 100 dimensions using Principal Component Analysis (PCA). Additionally, categorical and timestamp attributes are selected from the metadata of the images. The categorical attribute is discretized into integers starting from 0, and the timestamp attribute is normalized into floats between 0 and 1.
For each query, a query type is randomly selected from four possible types, denoted by the numbers 0 to 3. Then, we randomly choose two data points from dataset D, utilizing their categorical attribute (C) timestamp attribute (T), and vectors, to determine the values of the query. Specifically:
We assure that at least 100 data points in D meet the query limit.
Dataset D is in a binary format, beginning with a 4-byte integer num_vectors (uint32_t) indicating the number of vectors. This is followed by data for each vector, stored consecutively, with each vector occupying 102 (2 + vector_num_dimension) x sizeof(float32) bytes, summing up to num_vectors x 102 (2 + vector_num_dimension) x sizeof(float32) bytes in total. Specifically, for the 102 dimensions of each vector: the first dimension denotes the discretized categorical attribute C and the second dimension denotes the normalized timestamp attribute T. The rest 100 dimensions are the vector.
Query set Q is in a binary format, beginning with a 4-byte integer num_queries (uint32_t) indicating the number of queries. This is followed by data for each query, stored consecutively, with each query occupying 104 (4 + vector_num_dimension) x sizeof(float32) bytes, summing up to num_queries x 104 (4 + vector_num_dimension) x sizeof(float32) bytes in total.
The 104-dimensional representation for a query is organized as follows:
There are four types of queries, i.e., the query_type takes values from 0, 1, 2 and 3. The 4 types of queries correspond to:
The predicate for the categorical attribute is an equality predicate, i.e., C=v. And the predicate for the timestamp attribute is a range predicate, i.e., l≤T≤r.
Originally provided on https://dbgroup.cs.tsinghua.edu.cn/sigmod2024/task.shtml?content=datasets .
Facebook
TwitterAccurately predicting species’ range shifts in response to environmental change is paramount for understanding ecological processes and global change. In synthetic analyses, traits emerge as significant but weak predictors of species’ range shifts across recent climate change. These studies assume linear responses to traits, while detailed empirical work often reveals trait responses that are unimodal and contain thresholds or other nonlinearities. We hypothesize that the use of linear modeling approaches fails to capture these nonlinearities and therefore may be under-powering traits to predict range shifts. We evaluate the predictive performance of approaches that can capture nonlinear relationships (ridge-regularized linear regression, support vector regression with linear and nonlinear kernels, and random forests). We apply our models using six multi-decadal range shift datasets for plants, moths, marine fish, birds, and small mammals. We show that nonlinear approaches can perform b..., We assess model performance using six datasets encompassing a broad taxonomic range. The number of species per dataset ranges from 28 to 239 (mean=118, median=94), and range shifts were observed over periods ranging from 20 to 100+ years. Each dataset was derived from previous evaluations of traits as range shift predictors and consists of a list of focal species, associated species-level traits, and a range shift metric., , # Accounting for nonlinear responses to traits improves range shift predictions
https://doi.org/10.5061/dryad.wstqjq2v8
We assess the performance of nonlinear models to predict climate-induced range shifts using six datasets encompassing a broad taxonomic range. The number of species per dataset ranges from 28 to 239 (mean=118, median=94), and range shifts were observed over periods ranging from 20 to 100+ years. Each dataset was derived from previous evaluations of traits as range shift predictors and consists of a list of focal species, associated species-level traits, and a range shift metric.
See the DataDescriptions_CannistraBuckley.pdf file for information on the data and structure. Refer to the references below for additional information on the datasets and please cite those papers if you use this data.
Data was derived from the following sources:
Facebook
TwitterOur dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here
Facebook
TwitterHere is a list that shows the prime number list up to 10000. Source: easycalculation
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description:============Data set supporting the content of the article "Generation of scalable many-body Bell correlations in spin chains with short-range two-body interactions" byMarcin PĹ‚odzieĹ„, Tomasz Wasak, Emilia Witkowska, Maciej Lewenstein and Jan ChwedeĹ„czuk.The research data consists of the data for the dynamics of the correlator as a function of time, interaction range, number of spins in the chain.The datasets also contain the research data for the dynamics of the estimated many-body Bell correlator reconstructed from the classical shadows tomography.Authors:========1) Marcin PĹ‚odzieĹ„ ICFO - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, 08860 Castelldefels, Barcelona, Spain2) Tomasz WasakInstitute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University in ToruĹ„, Grudzi ̧adzka 5, 87-100 ToruĹ„, Poland3) Emilia Witkowska Institute of Physics PAS, Aleja Lotnikow 32/46, 02-668 Warszawa, Poland4) Maciej LewensteinICFO - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, 08860 Castelldefels, Barcelona, SpainICREA, Pg. LluĂs Companys 23, 08010 Barcelona, Spain5) Jan ChwedeĹ„czukFaculty of Physics, University of Warsaw, ul. Pasteura 5, PL-02-093 Warsaw, PolandKeywords:=========nonlocalityBell correlationsquantum spinsspin chainsising modelfinite range interactionsdynamical protocolDirectories:============data_correlator/epsilon_th_N_r/ - the value (right column) of the correlator E_N, see Eq. (4) of the manuscript, as a function of time tau (left column) data_correlator/Q_th_N_r/ - the value (right column) of the correlator Q_N, see Eq. (5) of the manuscript, as a function of time tau (left column)data_classical_shadows/ - datasets needed for Fig. 4 of the manuscriptdata_classical_shadows/exact_QN_N_*_r_*.txt - the value (second column) of the exact correlator Q_N calculated for given N (first *) and for given r (second ) as a function of time tau (first column), eg., exact_QN_N_4_r_3.txt is for N=4 and r=3.data_classical_shadows/est_QN_N_r*.txt - the value of the estimated correlator Q_N (second column) calculated for given N (first *) and for given r (second *) as a function of time tau (first column), eg., est_QN_N_4_r_3.txt is for N=4 and r=3.Format of the data:===================1) In the directory: data_correlator/epsilon_th_N_r/*The files are of the form:epsilon_th_N.xxx_r.yyy.dat,where N=xxx is the number of spins in the spin chain and r=yyy is the range of the interaction between the spins.For example, the file epsilon_th_N.014_r.010.datdescribes N=14 spins and the interaction range is r=10. See the manuscript for the definition of r and N.2) In the directory: Q_th_N_r/*The files are of the form:_Q_th_N.xxx_r.yyy.datorQ_th_N.xxx_r.yyy.dat,where N=xxx is the number of spins in the spin chain and r=yyy is the range of the interaction between the spins. For example, the file _Q_th_N.032_r.006.datdescribes N=32 spins and the interaction range is r=6. See the manuscript for the definition of r and N.3) In the directory: data_classical_shadows/Files in the format:data_classical_shadows/est_QN_N_*_r_*.txt, where first * stands for N and second for r, eg. est_QN_N_10_r_4.txt N=10 and r=4:column 1 -----> time taucolumn 2 -----> value of estimated Q_Ncolumn 3 -----> standard deviation of estimated Q_N (shaded region in Fig. 4).Files in the format:data_classical_shadows/exact_QN_N_*_r_*.txt, where first * stands for N and second for r, eg. exact_QN_N_8_r_4.txt N=8 and r=4:column 1 -----> time taucolumn 2 -----> value of Q_NLicense:========CC0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Sudoku dataset is constructed of algorithmically generated Sudoku grids, with different levels of masking (number of missing numbers). Sudokus grids can range from simple to difficult in terms of difficulty for humans and we wanted to see how Machine Learning handles the problem.
This dataset is one of the three hidden datasets used by the 2024 NAS Unseen-Data Challenge.
The images include 70,000 generated sudoku grids. Instead of regular sudoku, where the goal is to fill in the grid, we developed a classification task about identifying the number of a single square. The grids are generated at different levels of masking (number of missing values). Each grid is stored as a NumPy array, where the normal sudoku numbers (1-9) are stored at 0.1 - 0.9, respectively. Missing cells are stored as 0, and the target cell is labelled as 1.
The data is stored in a channels-first format with a shape of (n, 1, 9, 9) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).
For each class (Sudoku cell value), we generated 7,777 samples evenly distributed among the three sets. To round out the number of samples in the sets, we randomly selected class labels and generated a single sudoku grid of that label so that no label had more than 7,778 samples among the three sets.
The labels for this dataset are the possible sudoku cell values (1, 2, 3, 4, 5, 6, 7, 8, and 9). However, due to zero-indexing, we subtract one from the cell value to get the label. (I.e., a label of 1 means the target cell should be a 2.)
NumPy (.npy) files can be opened through the NumPy Python library, using the numpy.load() function by inputting the path to the file into the function as a parameter. The metadata file contains some basic information about the datasets, and can be opened in many text editors such as vim, nano, notepad++, notepad, etc
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).
Dataset Structure:
The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:
train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).
test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.
Content of the Data:
Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).
How to Use This Dataset:
MNIST folder (or the archive containing it) and extract its contents.mnist_png folder.train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.train folder provides the images you can use to train your machine learning models.test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.Citation:
The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.
Data Contribution:
Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.
If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!