100+ datasets found
  1. Data from: MNIST Handwritten Digits Dataset

    • kaggle.com
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghanshyam Saini (2025). MNIST Handwritten Digits Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/mnist-handwritten-digits-dataset
    Explore at:
    zip(29605861 bytes)Available download formats
    Dataset updated
    May 15, 2025
    Authors
    Ghanshyam Saini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MNIST Handwritten Digits Dataset (Organized by Folder)

    This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).

    Dataset Structure:

    The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:

    • train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).

    • test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.

    Content of the Data:

    Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).

    How to Use This Dataset:

    1. Download the main MNIST folder (or the archive containing it) and extract its contents.
    2. Navigate into the mnist_png folder.
    3. The train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.
    4. The train folder provides the images you can use to train your machine learning models.
    5. The test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.

    Citation:

    The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.

    Data Contribution:

    Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.

    If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!

  2. Two datasets and the generated result

    • figshare.com
    txt
    Updated Sep 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongsheng Yang (2020). Two datasets and the generated result [Dataset]. http://doi.org/10.6084/m9.figshare.13003076.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 24, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Dongsheng Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1_a.json: 1000 random numbers over the range 0-100 1_b.json: new numbers from the original 1000 numbers in 1_a.json using the equation y=3x+6 the result generated by these two datasets

  3. d

    Estimates of estuarine Richardson numbers at different freshwater discharge...

    • catalogue.data.govt.nz
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2001). Estimates of estuarine Richardson numbers at different freshwater discharge and tidal range - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-23528424
    Explore at:
    Dataset updated
    Feb 1, 2001
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Estimates of estuarine Richardson numbers at different freshwater discharge (q) and tidal range (tr) at Hobsonville (RiEH) and Stanley Bay (RiEN)

  4. two datasets and the visualization

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongsheng Yang (2023). two datasets and the visualization [Dataset]. http://doi.org/10.6084/m9.figshare.13007747.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Dongsheng Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    dataset_a.json: 1000 random numbers over the range 0-100dataset_b.json: new numbers from the original 1000 numbers in 1_a.json using the equation y=3x+6results.png: generated by these two datasets

  5. original : CIFAR 100

    • kaggle.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100
    Explore at:
    zip(168517945 bytes)Available download formats
    Dataset updated
    Dec 28, 2024
    Authors
    Shashwat Pandey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

    The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

    The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

    Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

    Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

    Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

    The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

    The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

    Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

    There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

    The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...

  6. N

    South Range, MI Population Breakdown by Gender and Age Dataset: Male and...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Range, MI Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e200fba9-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.

    Key observations

    Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the South Range is shown in the following column.
    • Population (Female): The female population in the South Range is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in South Range for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here

  7. TIGER/Line Shapefile, 2023, County, Champaign County, IL, Address Ranges...

    • catalog.data.gov
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2025). TIGER/Line Shapefile, 2023, County, Champaign County, IL, Address Ranges Relationship File [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-county-champaign-county-il-address-ranges-relationship-file
    Explore at:
    Dataset updated
    Aug 10, 2025
    Dataset provided by
    United States Department of Commercehttp://commerce.gov/
    United States Census Bureauhttp://census.gov/
    Area covered
    Champaign County, Illinois
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Relationship File (ADDR.dbf) contains the attributes of each address range. Each address range applies to a single edge and has a unique address range identifier (ARID) value. The edge to which an address range applies can be determined by linking the address range to the All Lines Shapefile (EDGES.shp) using the permanent topological edge identifier (TLID) attribute. Multiple address ranges can apply to the same edge since an edge can have multiple address ranges. Note that the most inclusive address range associated with each side of a street edge already appears in the All Lines Shapefile (EDGES.shp). The TIGER/Line Files contain potential address ranges, not individual addresses. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  8. r

    ANN development + final testing datasets

    • resodate.org
    • data-staging.niaid.nih.gov
    • +2more
    Updated Jan 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Authors (2018). ANN development + final testing datasets [Dataset]. http://doi.org/10.5281/ZENODO.1445865
    Explore at:
    Dataset updated
    Jan 1, 2018
    Dataset provided by
    Zenodo
    Authors
    Authors
    Description

    File name definitions:'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s'...v_175_250...' - dataset for velocity range [175, 250] m/s'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?input values in 'IN' sheet target values in 'TARGET' sheet Where to find the results from the best ANN model (for each target/output variable and each velocity range)?open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet Check reference below (to be added when the paper is published)https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams

  9. Z

    Data from: Regression-Test History Data for Flaky Test-Research, Dataset

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wendler, Philipp; Winter, Stefan (2024). Regression-Test History Data for Flaky Test-Research, Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10639029
    Explore at:
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    LMU Munich
    Ludwig-Maximilians-Universität München (LMU)
    Authors
    Wendler, Philipp; Winter, Stefan
    Description

    The dataset comprises developer test results of Maven projects with flaky tests across a range of consecutive commits from the projects' git commit histories. The Maven projects are a subset of those investigated in an OOPSLA 2020 paper. The commit range for this dataset has been chosen as the flakiness-introducing commit (FIC) and iDFlakies-commit (see the OOPSLA paper for details). The commit hashes have been obtained from the IDoFT dataset.

    The dataset will be presented at the 1st International Flaky Tests Workshop 2024 (FTW 2024). Please refer to our extended abstract for more details about the motivation for and context of this dataset.

    The following table provides a summary of the data.

    Slug (Module) FIC Hash Tests Commits Av. Commits/Test Flaky Tests Tests w/ Consistent Failures Total Distinct Histories

    TooTallNate/Java-WebSocket 822d40 146 75 75 24 1 2.6x10^9

    apereo/java-cas-client (cas-client-core) 5e3655 157 65 61.7 3 2 1.0x10^7

    eclipse-ee4j/tyrus (tests/e2e/standard-config) ce3b8c 185 16 16 12 0 261

    feroult/yawp (yawp-testing/yawp-testing-appengine) abae17 1 191 191 1 1 8

    fluent/fluent-logger-java 5fd463 19 131 105.6 11 2 8.0x10^32

    fluent/fluent-logger-java 87e957 19 160 122.4 11 3 2.1x10^31

    javadelight/delight-nashorn-sandbox d0d651 81 113 100.6 2 5 4.2x10^10

    javadelight/delight-nashorn-sandbox d19eee 81 93 83.5 1 5 2.6x10^9

    sonatype-nexus-community/nexus-repository-helm 5517c8 18 32 32 0 0 18

    spotify/helios (helios-services) 23260 190 448 448 0 37 190

    spotify/helios (helios-testing) 78a864 43 474 474 0 7 43

    The columns are composed of the following variables:

    Slug (Module): The project's GitHub slug (i.e., the project's URL is https://github.com/{Slug}) and, if specified, the module for which tests have been executed.

    FIC Hash: The flakiness-introducing commit hash for a known flaky test as described in this OOPSLA 2020 paper. As different flaky tests have different FIC hashes, there may be multiple rows for the same slug/module with different FIC hashes.

    Tests: The number of distinct test class and method combinations over the entire considered commit range.

    Commits: The number of commits in the considered commit range

    Av. Commits/Test: The average number of commits per test class and method combination in the considered commit range. The number of commits may vary for each test class, as some tests may be added or removed within the considered commit range.

    Flaky Tests: The number of distinct test class and method combinations that have more than one test result (passed/skipped/error/failure + exception type, if any + assertion message, if any) across 30 repeated test suite executions on at least one commit in the considered commit range.

    Tests w/ Consistent Failures: The number of distinct test class and method combinations that have the same error or failure result (error/failure + exception type, if any + assertion message, if any) across all 30 repeated test suite executions on at least one commit in the considered commit range.

    Total Distinct Histories: The number of distinct test results (passed/skipped/error/failure + exception type, if any + assertion message, if any) for all test class and method combinations along all commits for that test in the considered commit range.

  10. h

    synthetic-sorting

    • huggingface.co
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sangjun Park (2023). synthetic-sorting [Dataset]. https://huggingface.co/datasets/cosmoquester/synthetic-sorting
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2023
    Authors
    Sangjun Park
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Synthetic Sorting Dataset

    This dataset is generated by randomly shuffling the numbers according to the predefined probability distribution of numbers. The task is to sort the numbers in ascending order. I used the script from deep-spin/infinite-former to generate the dataset with longer sequence lengths and larger examples. Total 21 number of tokens are used in the dataset. The symbols are in the range of 0 to 19. The last "20" token is special and means the end of sequence. Use… See the full description on the dataset page: https://huggingface.co/datasets/cosmoquester/synthetic-sorting.

  11. d

    Topographic map of Rees Valley, Richardson Range, Shotover and Arrow Rivers...

    • catalogue.data.govt.nz
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Topographic map of Rees Valley, Richardson Range, Shotover and Arrow Rivers - - annotated with field numbers and structural measurements - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/topographic-map-of-rees-valley-richardson-range-shotover-and-arrow-rivers-annotated-with-field-
    Explore at:
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Shotover River, Rees Valley Road
    Description

    Pencil and pen on linen backed paper, medium in detail, fair condition. - Observation measure: mainly observation. - Map size: 680 mm x 1250 mm. Notes: Chandler's topographic map used in compilation of B.L. Wood's Geological Map Of New Zealand 1:250 000 - Sheet 22 - Wakatipu - 1962. Southern part of map is heavily annotated with field numbers. Northern part of map annotated with structural measurements and geological notes. No date for annotation, but before publication of Sheet 22 in 1962 - possibly by B.L. Wood. Keywords: OTAGO; REES VALLEY; RICHARDSON MOUNTAINS; SHOTOVER RIVER; ARROW RIVER; TOPOGRAPHIC MAPS; GEOLOGIC MAPS; STRUCTURE; MEASUREMENT; FAULTING

  12. Z

    Wallhack1.8k Dataset | Data Augmentation Techniques for Cross-Domain WiFi...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kampel, Martin (2025). Wallhack1.8k Dataset | Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_8188998
    Explore at:
    Dataset updated
    Apr 4, 2025
    Dataset provided by
    Kampel, Martin
    Strohmayer, Julian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the Wallhack1.8k dataset for WiFi-based long-range activity recognition in Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS)/Through-Wall scenarios, as proposed in [1,2], as well as the CAD models (of 3D-printable parts) of the WiFi systems proposed in [2].

    PyTroch Dataloader

    A minimal PyTorch dataloader for the Wallhack1.8k dataset is provided at: https://github.com/StrohmayerJ/wallhack1.8k

    Dataset Description

    The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).

    To assess cross-scenario and cross-system generalization, WiFi packet sequences were collected in LoS and through-wall (NLoS) scenarios, utilizing two different WiFi systems (BQ: biquad antenna and PIFA: printed inverted-F antenna). The dataset is structured accordingly:

    LOS/BQ/ <- WiFi packets collected in the LoS scenario using the BQ system

    LOS/PIFA/ <- WiFi packets collected in the LoS scenario using the PIFA system

    NLOS/BQ/ <- WiFi packets collected in the NLoS scenario using the BQ system

    NLOS/PIFA/ <- WiFi packets collected in the NLoS scenario using the PIFA system

    These directories contain the raw WiFi packet time series (see Table 1). Each row represents a single WiFi packet with the complex CSI vector H being stored in the "data" field and the class label being stored in the "class" field. H is of the form [I, R, I, R, ..., I, R], where two consecutive entries represent imaginary and real parts of complex numbers (the Channel Frequency Responses of subcarriers). Taking the absolute value of H (e.g., via numpy.abs(H)) yields the subcarrier amplitudes A.

    To extract the 52 L-LTF subcarriers used in [1], the following indices of A are to be selected:

    52 L-LTF subcarriers

    csi_valid_subcarrier_index = [] csi_valid_subcarrier_index += [i for i in range(6, 32)] csi_valid_subcarrier_index += [i for i in range(33, 59)]

    Additional 56 HT-LTF subcarriers can be selected via:

    56 HT-LTF subcarriers

    csi_valid_subcarrier_index += [i for i in range(66, 94)]
    csi_valid_subcarrier_index += [i for i in range(95, 123)]

    For more details on subcarrier selection, see ESP-IDF (Section Wi-Fi Channel State Information) and esp-csi.

    Extracted amplitude spectrograms with the corresponding label files of the train/validation/test split: "trainLabels.csv," "validationLabels.csv," and "testLabels.csv," can be found in the spectrograms/ directory.

    The columns in the label files correspond to the following: [Spectrogram index, Class label, Room label]

    Spectrogram index: [0, ..., n]

    Class label: [0,1,2], where 0 = "no presence", 1 = "walking", and 2 = "walking + arm-waving."

    Room label: [0,1,2,3,4,5], where labels 1-5 correspond to the room number in the NLoS scenario (see Fig. 3 in [1]). The label 0 corresponds to no room and is used for the "no presence" class.

    Dataset Overview:

    Table 1: Raw WiFi packet sequences.

    Scenario System "no presence" / label 0 "walking" / label 1 "walking + arm-waving" / label 2 Total

    LoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    LoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    NLoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    NLoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv

    4 20 20 44

    Table 2: Sample/Spectrogram distribution across activity classes in Wallhack1.8k.

    Scenario System

    "no presence" / label 0

    "walking" / label 1

    "walking + arm-waving" / label 2 Total

    LoS BQ 149 154 155

    LoS PIFA 149 160 152

    NLoS BQ 148 150 152

    NLoS PIFA 143 147 147

    589 611 606 1,806

    Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to one of our papers [1,2].

    [1] Strohmayer, Julian, and Martin Kampel. (2024). “Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition”, In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 42-56). Cham: Springer Nature Switzerland, doi: https://doi.org/10.1007/978-3-031-63211-2_4.

    [2] Strohmayer, Julian, and Martin Kampel., “Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition,” 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 3594-3599, doi: https://doi.org/10.1109/ICIP51287.2024.10647666.

    BibTeX citations:

    @inproceedings{strohmayer2024data, title={Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition}, author={Strohmayer, Julian and Kampel, Martin}, booktitle={IFIP International Conference on Artificial Intelligence Applications and Innovations}, pages={42--56}, year={2024}, organization={Springer}}@INPROCEEDINGS{10647666, author={Strohmayer, Julian and Kampel, Martin}, booktitle={2024 IEEE International Conference on Image Processing (ICIP)}, title={Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition}, year={2024}, volume={}, number={}, pages={3594-3599}, keywords={Visualization;Accuracy;System performance;Directional antennas;Directive antennas;Reflector antennas;Sensors;Human Activity Recognition;WiFi;Channel State Information;Through-Wall Sensing;ESP32}, doi={10.1109/ICIP51287.2024.10647666}}

  13. SIGMOD 2024 Programming Contest Datasets

    • zenodo.org
    bin
    Updated Oct 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guoliang Li; Dong Deng; Guoliang Li; Dong Deng (2024). SIGMOD 2024 Programming Contest Datasets [Dataset]. http://doi.org/10.5281/zenodo.13998879
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guoliang Li; Dong Deng; Guoliang Li; Dong Deng
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Our datasets, both released and evaluation set, are derived from the YFCC100M Dataset. Each dataset comprises vectors encoded from images using the CLIP model, which are then reduced to 100 dimensions using Principal Component Analysis (PCA). Additionally, categorical and timestamp attributes are selected from the metadata of the images. The categorical attribute is discretized into integers starting from 0, and the timestamp attribute is normalized into floats between 0 and 1.

    For each query, a query type is randomly selected from four possible types, denoted by the numbers 0 to 3. Then, we randomly choose two data points from dataset D, utilizing their categorical attribute (C) timestamp attribute (T), and vectors, to determine the values of the query. Specifically:

    • Randomly sample two data points from D.
    • Use the categorical value of the first data point as v for the equality predicate over the categorical attribute C.
    • Use the timestamp attribute values of the two sampled data points for the range predicate. Designate l as the smaller timestamp value and r as the larger. The range predicate is thus defined as l≤T≤r.
    • Use the vector of the first data point as the query vector.
    • If the query type does not involve v, l, or r, their values are set to -1.

    We assure that at least 100 data points in D meet the query limit.

    Dataset Structure

    Dataset D is in a binary format, beginning with a 4-byte integer num_vectors (uint32_t) indicating the number of vectors. This is followed by data for each vector, stored consecutively, with each vector occupying 102 (2 + vector_num_dimension) x sizeof(float32) bytes, summing up to num_vectors x 102 (2 + vector_num_dimension) x sizeof(float32) bytes in total. Specifically, for the 102 dimensions of each vector: the first dimension denotes the discretized categorical attribute C and the second dimension denotes the normalized timestamp attribute T. The rest 100 dimensions are the vector.

    Query Set Structure

    Query set Q is in a binary format, beginning with a 4-byte integer num_queries (uint32_t) indicating the number of queries. This is followed by data for each query, stored consecutively, with each query occupying 104 (4 + vector_num_dimension) x sizeof(float32) bytes, summing up to num_queries x 104 (4 + vector_num_dimension) x sizeof(float32) bytes in total.

    The 104-dimensional representation for a query is organized as follows:

    • The first dimension denotes query_type (takes values from 0, 1, 2, 3).
    • The second dimension denotes the specific query value v for the categorical attribute (if not queried, takes -1).
    • The third dimension denotes the specific query value l for the timestamp attribute (if not queried, takes -1).
    • The fourth dimension denotes the specific query value r for the timestamp attribute (if not queried, takes -1).
    • The rest 100 dimensions are the query vector.

    There are four types of queries, i.e., the query_type takes values from 0, 1, 2 and 3. The 4 types of queries correspond to:

    • If query_type=0: Vector-only query, i.e., the conventional approximate nearest neighbor (ANN) search query.
    • If query_type=1: Vector query with categorical attribute constraint, i.e., ANN search for data points satisfying C=v.
    • If query_type=2: Vector query with timestamp attribute constraint, i.e., ANN search for data points satisfying l≤T≤r.
    • If query_type=3: Vector query with both categorical and timestamp attribute constraints, i.e. ANN search for data points satisfying C=v and l≤T≤r.

    The predicate for the categorical attribute is an equality predicate, i.e., C=v. And the predicate for the timestamp attribute is a range predicate, i.e., l≤T≤r.

    Originally provided on https://dbgroup.cs.tsinghua.edu.cn/sigmod2024/task.shtml?content=datasets .

  14. d

    Data from: Accounting for nonlinear responses to traits improves range shift...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Cannistra; Lauren Buckley (2025). Accounting for nonlinear responses to traits improves range shift predictions [Dataset]. http://doi.org/10.5061/dryad.wstqjq2v8
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Anthony Cannistra; Lauren Buckley
    Description

    Accurately predicting species’ range shifts in response to environmental change is paramount for understanding ecological processes and global change. In synthetic analyses, traits emerge as significant but weak predictors of species’ range shifts across recent climate change. These studies assume linear responses to traits, while detailed empirical work often reveals trait responses that are unimodal and contain thresholds or other nonlinearities. We hypothesize that the use of linear modeling approaches fails to capture these nonlinearities and therefore may be under-powering traits to predict range shifts. We evaluate the predictive performance of approaches that can capture nonlinear relationships (ridge-regularized linear regression, support vector regression with linear and nonlinear kernels, and random forests). We apply our models using six multi-decadal range shift datasets for plants, moths, marine fish, birds, and small mammals. We show that nonlinear approaches can perform b..., We assess model performance using six datasets encompassing a broad taxonomic range. The number of species per dataset ranges from 28 to 239 (mean=118, median=94), and range shifts were observed over periods ranging from 20 to 100+ years. Each dataset was derived from previous evaluations of traits as range shift predictors and consists of a list of focal species, associated species-level traits, and a range shift metric., , # Accounting for nonlinear responses to traits improves range shift predictions

    https://doi.org/10.5061/dryad.wstqjq2v8

    We assess the performance of nonlinear models to predict climate-induced range shifts using six datasets encompassing a broad taxonomic range. The number of species per dataset ranges from 28 to 239 (mean=118, median=94), and range shifts were observed over periods ranging from 20 to 100+ years. Each dataset was derived from previous evaluations of traits as range shift predictors and consists of a list of focal species, associated species-level traits, and a range shift metric.

    Description of the data and file structure

    See the DataDescriptions_CannistraBuckley.pdf file for information on the data and structure. Refer to the references below for additional information on the datasets and please cite those papers if you use this data.

    Sharing/Access information

    Data was derived from the following sources:

    • Ange...
  15. m

    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

    • app.mobito.io
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
    Explore at:
    Area covered
    United States
    Description

    Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).

  16. N

    South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Range, MI Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/52700142-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for South Range, MI, is 21.5.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for South Range, MI, is 28.6.
    • Total dependency ratio for South Range, MI is 50.1.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for South Range, MI is 3.5.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the South Range for the selected age group is shown in the following column.
    • Population (Female): The female population in the South Range for the selected age group is shown in the following column.
    • Total Population: The total population of the South Range for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here

  17. N

    South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). South Range, MI Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis [Dataset]. https://www.neilsberg.com/research/datasets/63632866-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 16, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for South Range, MI, is 16.9.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for South Range, MI, is 24.6.
    • Total dependency ratio for South Range, MI is 41.5.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for South Range, MI is 4.1.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the South Range for the selected age group is shown in the following column.
    • Population (Female): The female population in the South Range for the selected age group is shown in the following column.
    • Total Population: The total population of the South Range for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here

  18. Number of primes in every 100 numbers up to 10000

    • kaggle.com
    zip
    Updated May 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    In06 Days (2021). Number of primes in every 100 numbers up to 10000 [Dataset]. https://www.kaggle.com/datasets/mathnights/number-of-primes-in-every-100-numbers-up-to-10000
    Explore at:
    zip(668 bytes)Available download formats
    Dataset updated
    May 15, 2021
    Authors
    In06 Days
    Description

    Context

    Here is a list that shows the prime number list up to 10000. Source: easycalculation

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  19. R

    Datasets for "Generation of scalable many-body Bell correlations in spin...

    • repod.icm.edu.pl
    txt, zip
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Płodzień, Marcin; Wasak, Tomasz; Witkowska, Emilia; Lewenstein, Maciej; Chwedeńczuk, Jan (2024). Datasets for "Generation of scalable many-body Bell correlations in spin chains with short-range two-body interactions" [Dataset]. http://doi.org/10.18150/XLEKW5
    Explore at:
    txt(3744), zip(427157439)Available download formats
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    RepOD
    Authors
    Płodzień, Marcin; Wasak, Tomasz; Witkowska, Emilia; Lewenstein, Maciej; Chwedeńczuk, Jan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    MICIIN with funding from European Union NextGenerationEU (PRTR-C17.I1) and by Generalitat de Catalunya; FundaciĂł Cellex; FundaciĂł Mir-Puig
    Polish National Agency for Academic Exchange
    European Commission
    ERC
    Barcelona Supercomputing Center MareNostrum
    EU
    Generalitat de Catalunya
    National Science Centre (Poland)
    Ministerio de Ciencia y Innovation Agencia Estatal de Investigaciones
    ICFO
    Narodowe Centrum Nauki
    Description

    Description:============Data set supporting the content of the article "Generation of scalable many-body Bell correlations in spin chains with short-range two-body interactions" byMarcin Płodzień, Tomasz Wasak, Emilia Witkowska, Maciej Lewenstein and Jan Chwedeńczuk.The research data consists of the data for the dynamics of the correlator as a function of time, interaction range, number of spins in the chain.The datasets also contain the research data for the dynamics of the estimated many-body Bell correlator reconstructed from the classical shadows tomography.Authors:========1) Marcin Płodzień ICFO - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, 08860 Castelldefels, Barcelona, Spain2) Tomasz WasakInstitute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University in Toruń, Grudzi ̧adzka 5, 87-100 Toruń, Poland3) Emilia Witkowska Institute of Physics PAS, Aleja Lotnikow 32/46, 02-668 Warszawa, Poland4) Maciej LewensteinICFO - Institut de Ciencies Fotoniques, The Barcelona Institute of Science and Technology, 08860 Castelldefels, Barcelona, SpainICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain5) Jan ChwedeńczukFaculty of Physics, University of Warsaw, ul. Pasteura 5, PL-02-093 Warsaw, PolandKeywords:=========nonlocalityBell correlationsquantum spinsspin chainsising modelfinite range interactionsdynamical protocolDirectories:============data_correlator/epsilon_th_N_r/ - the value (right column) of the correlator E_N, see Eq. (4) of the manuscript, as a function of time tau (left column) data_correlator/Q_th_N_r/ - the value (right column) of the correlator Q_N, see Eq. (5) of the manuscript, as a function of time tau (left column)data_classical_shadows/ - datasets needed for Fig. 4 of the manuscriptdata_classical_shadows/exact_QN_N_*_r_*.txt - the value (second column) of the exact correlator Q_N calculated for given N (first *) and for given r (second ) as a function of time tau (first column), eg., exact_QN_N_4_r_3.txt is for N=4 and r=3.data_classical_shadows/est_QN_N_r*.txt - the value of the estimated correlator Q_N (second column) calculated for given N (first *) and for given r (second *) as a function of time tau (first column), eg., est_QN_N_4_r_3.txt is for N=4 and r=3.Format of the data:===================1) In the directory: data_correlator/epsilon_th_N_r/*The files are of the form:epsilon_th_N.xxx_r.yyy.dat,where N=xxx is the number of spins in the spin chain and r=yyy is the range of the interaction between the spins.For example, the file epsilon_th_N.014_r.010.datdescribes N=14 spins and the interaction range is r=10. See the manuscript for the definition of r and N.2) In the directory: Q_th_N_r/*The files are of the form:_Q_th_N.xxx_r.yyy.datorQ_th_N.xxx_r.yyy.dat,where N=xxx is the number of spins in the spin chain and r=yyy is the range of the interaction between the spins. For example, the file _Q_th_N.032_r.006.datdescribes N=32 spins and the interaction range is r=6. See the manuscript for the definition of r and N.3) In the directory: data_classical_shadows/Files in the format:data_classical_shadows/est_QN_N_*_r_*.txt, where first * stands for N and second for r, eg. est_QN_N_10_r_4.txt N=10 and r=4:column 1 -----> time taucolumn 2 -----> value of estimated Q_Ncolumn 3 -----> standard deviation of estimated Q_N (shaded region in Fig. 4).Files in the format:data_classical_shadows/exact_QN_N_*_r_*.txt, where first * stands for N and second for r, eg. exact_QN_N_8_r_4.txt N=8 and r=4:column 1 -----> time taucolumn 2 -----> value of Q_NLicense:========CC0

  20. n

    Sudoku Dataset

    • data.ncl.ac.uk
    txt
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Towers; Linus Ericsson; Amir Atapour-Abarghouei; Andrew Stephen McGough; Elliot J Crowley (2024). Sudoku Dataset [Dataset]. http://doi.org/10.25405/data.ncl.26976121.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 12, 2024
    Dataset provided by
    Newcastle University
    Authors
    David Towers; Linus Ericsson; Amir Atapour-Abarghouei; Andrew Stephen McGough; Elliot J Crowley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Sudoku dataset is constructed of algorithmically generated Sudoku grids, with different levels of masking (number of missing numbers). Sudokus grids can range from simple to difficult in terms of difficulty for humans and we wanted to see how Machine Learning handles the problem. This dataset is one of the three hidden datasets used by the 2024 NAS Unseen-Data Challenge. The images include 70,000 generated sudoku grids. Instead of regular sudoku, where the goal is to fill in the grid, we developed a classification task about identifying the number of a single square. The grids are generated at different levels of masking (number of missing values). Each grid is stored as a NumPy array, where the normal sudoku numbers (1-9) are stored at 0.1 - 0.9, respectively. Missing cells are stored as 0, and the target cell is labelled as 1. The data is stored in a channels-first format with a shape of (n, 1, 9, 9) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing). For each class (Sudoku cell value), we generated 7,777 samples evenly distributed among the three sets. To round out the number of samples in the sets, we randomly selected class labels and generated a single sudoku grid of that label so that no label had more than 7,778 samples among the three sets. The labels for this dataset are the possible sudoku cell values (1, 2, 3, 4, 5, 6, 7, 8, and 9). However, due to zero-indexing, we subtract one from the cell value to get the label. (I.e., a label of 1 means the target cell should be a 2.) NumPy (.npy) files can be opened through the NumPy Python library, using the numpy.load() function by inputting the path to the file into the function as a parameter. The metadata file contains some basic information about the datasets, and can be opened in many text editors such as vim, nano, notepad++, notepad, etc

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ghanshyam Saini (2025). MNIST Handwritten Digits Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/mnist-handwritten-digits-dataset
Organization logo

Data from: MNIST Handwritten Digits Dataset

Related Article
Explore at:
zip(29605861 bytes)Available download formats
Dataset updated
May 15, 2025
Authors
Ghanshyam Saini
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

MNIST Handwritten Digits Dataset (Organized by Folder)

This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).

Dataset Structure:

The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:

  • train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).

  • test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.

Content of the Data:

Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).

How to Use This Dataset:

  1. Download the main MNIST folder (or the archive containing it) and extract its contents.
  2. Navigate into the mnist_png folder.
  3. The train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.
  4. The train folder provides the images you can use to train your machine learning models.
  5. The test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.

Citation:

The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.

Data Contribution:

Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.

If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!

Search
Clear search
Close search
Google apps
Main menu