100+ datasets found
  1. H

    Large Dataset of Generalization Patterns in the Number Game

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Eric J. Bigelow; Steven T. Piantadosi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

  2. MNIST-100

    • kaggle.com
    zip
    Updated Jul 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Wierzbiński (2023). MNIST-100 [Dataset]. https://www.kaggle.com/datasets/martininf1n1ty/mnist100
    Explore at:
    zip(23452456 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    Marcin Wierzbiński
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The MNIST-100 dataset is a variation of the original MNIST dataset, consisting of 100 handwritten numbers extracted from the MNIST dataset. Unlike the traditional MNIST dataset, which contains 60,000 training images of digits from 0 to 9, the Modified MNIST-10 dataset focuses on 100 numbers.

    Dataset Overview: - Dataset Name: MNIST-100 - Total Number of Images: train: 60000 test: 1000 - Classes: 100 (Numbers from 00 to 99) - Image Size: 28x56 pixels (grayscale)

    Data Collection: The MNIST-100 dataset was created by randomly selecting 10 unique digits from the original MNIST dataset. For each selected digit, 10 representative images were extracted, resulting in a total of 100 images. These images were carefully chosen to represent a diverse range of handwriting styles for each digit.

    Each image in the dataset is labeled with its corresponding numbers, ranging from 00 to 99, making it suitable for classification tasks. Researchers and practitioners can use this dataset to train and evaluate machine learning algorithms and neural networks for digit recognition and classification.

    Please note that the Modified MNIST-100 dataset is not intended to replace the original MNIST dataset but serves as a complementary resource for specific applications requiring a smaller and more focused subset of the MNIST data.

    Overall, the MNIST-100 dataset offers a compact and representative collection of 100 handwritten numbers, providing a convenient tool for experimentation and learning in computer vision and pattern recognition.

    Label Distribution for training set:

    LabelOccurrencesLabelOccurrencesLabelOccurrences
    05613462968606
    16873554069582
    25823658870566
    36333761971659
    45883858472572
    55443960973682
    65824057074627
    76154167975598
    85844254476605
    95674356777602
    106414457478595
    117804555579586
    127204655080569
    136994761481628
    146304861482578
    156274959583622
    166845050584569
    177135158385540
    187435251286557
    197065355587628
    205275450488562
    217105548889625
    225865653190600
    235845755691700
    245685849792622
    255305952093622
    266126055694591
    276276168295557
    286186259496580
    296196353997640
    306226461098577
    316846551499563
    3260666587
    3359267655

    Test data:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7193292%2Fac688f2526851734cb50be10f0a7bd7d%2Fpobrane%20(16).png?generation=1690276359580027&alt=media" alt="">

    LabelOccurrencesLabelOccurrencesLabelOccurrences
    0096341006890
    0110835916992
    02913610770102
    03963711271116
    0475389772101
    0585399673106
    0688401037498
    07964112375 ...
  3. Landmarks Dataset for sign recognition numbers

    • kaggle.com
    zip
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshat Mittu (2022). Landmarks Dataset for sign recognition numbers [Dataset]. https://www.kaggle.com/datasets/akshatmittu/landmarks-dataset-for-sign-recognition-numbers
    Explore at:
    zip(50385 bytes)Available download formats
    Dataset updated
    Nov 4, 2022
    Authors
    Akshat Mittu
    Description

    This dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).

    You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly

    import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os

    for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:

        for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
    
          mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks,
                   connections = mp_hands.HAND_CONNECTIONS)
      a = dict()
      a['label'] = t
      for i in range(21):
        s = ('x','y','z')
        k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z)
        for j in range(len(k)):
          a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j]
      df = df.append(a,ignore_index=True)
    
  4. r

    Dataset for The effects of a number line intervention on calculation skills

    • researchdata.edu.au
    • figshare.mq.edu.au
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.V1
    Explore at:
    Dataset updated
    May 18, 2023
    Dataset provided by
    Macquarie University
    Authors
    Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas
    Description

    Study information

    The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.

    All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.

    The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.

    The number of measurement points were distributed across participants as follows:

    Participant 1 – 3 baseline, 6 treatment, 1 post-treatment

    Participant 3 – 2 baseline, 7 treatment, 1 post-treatment

    Participant 5 – 2 baseline, 5 treatment, 1 post-treatment

    Participant 6 – 3 baseline, 4 treatment, 1 post-treatment

    Participant 7 – 2 baseline, 5 treatment, 1 post-treatment

    In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.


    Measures

    Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.


    Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.


    Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.


    Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.


    The Number Line Intervention

    During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.


    Variables in the dataset

    Age = age in ‘years, months’ at the start of the study

    Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)

    Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).


    The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.


    The second part of the variable name refers to the task, as follows:

    DC = dot comparison

    SDC = single-digit computation

    NLE_UT = number line estimation (untrained set)

    NLE_T= number line estimation (trained set)

    CE = multidigit computational estimation

    NC = number comparison

    The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).


    Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.





  5. TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address...

    • catalog.data.gov
    • gimi9.com
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-hamilton-county-ne-address-range-feature
    Explore at:
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    Hamilton County
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  6. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  7. Prime Number Source Code with Dataset

    • figshare.com
    zip
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ayman Mostafa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.

  8. TIGER/Line Shapefile, Current, County, DuPage County, IL, Address...

    • catalog.data.gov
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, DuPage County, IL, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-dupage-county-il-address-range-feature
    Explore at:
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    DuPage County, Illinois
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  9. HaDR: Dataset for hands instance segmentation

    • kaggle.com
    zip
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ales Vysocky (2023). HaDR: Dataset for hands instance segmentation [Dataset]. https://www.kaggle.com/datasets/alevysock/hadr-dataset-for-hands-instance-segmentation
    Explore at:
    zip(10662295286 bytes)Available download formats
    Dataset updated
    Mar 7, 2023
    Authors
    Ales Vysocky
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.

    S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826

    The HaDR dataset is a multimodal dataset designed for human-robot gesture-based interaction research, consisting of RGB and Depth frames, with binary masks for each hand instance (i1, i2, single class data). The dataset is entirely synthetic, generated using Domain Randomization technique in CoppeliaSim 3D. The dataset can be used to train Deep Learning models to recognize hands using either a single modality (RGB or depth) or both simultaneously. The training-validation split comprises 95K and 22K samples, respectively, with annotations provided in COCO format. The instances are uniformly distributed across the image boundaries. The vision sensor captures depth and color images of the scene, with the depth pixel values scaled into a single channel 8-bit grayscale image in the range [0.2, 1.0] m. The following aspects of the scene were randomly varied during generation of dataset: • Number, colors, textures, scales and types of distractor objects selected from a set of 3D models of general tools and geometric primitives. A special type of distractor – an articulated dummy without hands (for instance-free samples) • Hand gestures (9 options). • Hand models’ positions and orientations. • Texture and surface properties (diffuse, specular and emissive properties) and number (from none to 2) of the object of interest, as well as its background. • Number and locations of directional lights sources (from 1 to 4), in addition to a planar light for ambient illumination. The sample resolution is set to 320×256, encoded in lossless PNG format, and contains only right hand meshes (we suggest using Flip augmentations during training), with a maximum of two instances per sample.

    Test dataset (real camera images): Test dataset containing 706 images was captured using a real RGB-D camera (RealSense L515) in a cluttered and unstructured industrial environment. The dataset comprises various scenarios with diverse lighting conditions, backgrounds, obstacles, number of hands, and different types of work gloves (red, green, white, yellow, no gloves) with varying sleeve lengths. The dataset is assumed to have only one user, and the maximum number of hand instances per sample was limited to two. The dataset was manually labelled, and we provide hand instance segmentation COCO annotations in instances_hands_full.json (separately for train and val) and full arm instance annotations in instances_arms_full.json. The sample resolution was set to 640×480, and depth images were encoded in the same way as those of the synthetic dataset.

    Channel-wise normalization and standardization parameters for datasets

    DatasetMean (R, G, B, D)STD (R, G, B, D)
    Train98.173, 95.456, 93.858, 55.87267.539, 67.194, 67.796, 47.284
    Validation99.321, 97.284, 96.318, 58.18967.814, 67.518, 67.576, 47.186
    Test123.675, 116.28, 103.53, 35.379258.395, 57.12, 57.375, 45.978

    If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.

    S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826

  10. 🏭 Predicting Manufacturing Defects Dataset

    • kaggle.com
    zip
    Updated Jun 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabie El Kharoua (2024). 🏭 Predicting Manufacturing Defects Dataset [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/predicting-manufacturing-defects-dataset
    Explore at:
    zip(371525 bytes)Available download formats
    Dataset updated
    Jun 17, 2024
    Authors
    Rabie El Kharoua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This dataset provides insights into factors influencing defect rates in a manufacturing environment. Each record represents various metrics crucial for predicting high or low defect occurrences in production processes.

    Variables Description

    Production Metrics

    ProductionVolume: Number of units produced per day. - Data Type: Integer. - Range: 100 to 1000 units/day.

    ProductionCost: Cost incurred for production per day. - Data Type: Float. - Range: $5000 to $20000.

    Supply Chain and Logistics

    SupplierQuality: Quality ratings of suppliers. - Data Type: Float (%). - Range: 80% to 100%.

    DeliveryDelay: Average delay in delivery. - Data Type: Integer (days). - Range: 0 to 5 days.

    Quality Control and Defect Rates

    DefectRate: Defects per thousand units produced. - Data Type: Float. - Range: 0.5 to 5.0 defects.

    QualityScore: Overall quality assessment. - Data Type: Float (%). - Range: 60% to 100%.

    Maintenance and Downtime

    MaintenanceHours: Hours spent on maintenance per week. - Data Type: Integer. - Range: 0 to 24 hours.

    DowntimePercentage: Percentage of production downtime. - Data Type: Float (%). - Range: 0% to 5%.

    Inventory Management

    InventoryTurnover: Ratio of inventory turnover. - Data Type: Float. - Range: 2 to 10.

    StockoutRate: Rate of inventory stockouts. - Data Type: Float (%). - Range: 0% to 10%.

    Workforce Productivity and Safety

    WorkerProductivity: Productivity level of the workforce. - Data Type: Float (%). - Range: 80% to 100%.

    SafetyIncidents: Number of safety incidents per month. - Data Type: Integer. - Range: 0 to 10 incidents.

    Energy Consumption and Efficiency

    EnergyConsumption: Energy consumed in kWh. - Data Type: Float. - Range: 1000 to 5000 kWh.

    EnergyEfficiency: Efficiency factor of energy usage. - Data Type: Float. - Range: 0.1 to 0.5.

    Additive Manufacturing

    AdditiveProcessTime: Time taken for additive manufacturing. - Data Type: Float (hours). - Range: 1 to 10 hours.

    AdditiveMaterialCost: Cost of additive materials per unit. - Data Type: Float ($). - Range: $100 to $500.

    Target Variable

    DefectStatus: Predicted defect status. - Data Type: Binary (0 for Low Defects, 1 for High Defects).

    Defect Instances

    The dataset focuses on defect instances more because they do not occur often. However, non-defect instances were added too for this reason the dataset is imbalanced, consider balancing it before proceeding with machine learning techniques.

    Data Conclusion

    This dataset encompasses a comprehensive collection of metrics vital for predicting defect rates in manufacturing operations. It includes production volumes, supply chain quality, quality control assessments, maintenance schedules, inventory management details, workforce productivity metrics, energy consumption patterns, additive manufacturing specifics, and more.

    Dataset Usage and Attribution Notice

    This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

    Exclusive Synthetic Dataset

    This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.

  11. Marathi and Maharashtrian Ornaments Dataset

    • kaggle.com
    zip
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tushar Kute (2025). Marathi and Maharashtrian Ornaments Dataset [Dataset]. https://www.kaggle.com/datasets/tusharkute/marathi-and-maharashtrian-ornamants-dataset/code
    Explore at:
    zip(8971 bytes)Available download formats
    Dataset updated
    Jul 29, 2025
    Authors
    Tushar Kute
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset comprises 953 synthetically generated entries detailing various traditional Marathi ornaments. It is designed to provide a structured collection of common features associated with these unique pieces of jewelry, often worn in Maharashtra, India.

    Purpose: The primary purpose of this dataset is to serve as a foundational resource for:

    Educational Projects: Students and enthusiasts can use it to learn about data handling, analysis, and visualization.
    
    Machine Learning Exploration: Researchers can explore classification or regression tasks, for instance, predicting the type of ornament based on its physical properties or vice-versa.
    
    Jewelry Domain Studies: Individuals interested in traditional Indian jewelry can gain insights into the typical characteristics of these ornaments.
    
    Data Generation Practice: It can serve as an example for understanding how synthetic datasets can be created for specific domains.
    

    Content & Generation: The dataset was created programmatically by defining plausible ranges and distributions for each feature based on general knowledge of these ornaments. While synthetic, the values aim to reflect realistic characteristics for each ornament type, acknowledging that actual jewelry pieces will have unique variations. For example:

    Weight, Length/Height, Width: Ranges were set to represent typical sizes and weights.
    Number of Components/Units & Stones/Pearls: These features vary significantly based on the ornament's intricate design, from single-unit pieces like 'Nath' to multi-component necklaces like 'Thushi' or 'Mohan Mala'.
    Carat Weight of Stones: Applied only to ornaments that typically feature stones or pearls.
    Gold Purity: Reflects common gold purities used in Indian jewelry (e.g., 20K, 21K, 22K, 23K, 24K). Silver purity (e.g., 80-95%) is assigned for 'Jodvi'.
    Primary Material: Predominantly 'Gold' for most ornaments, with 'Silver' for 'Jodvi'.
    

    This dataset offers a starting point for analyses where real-world data might be scarce or difficult to collect.

    File Information

    File Name: marathi_ornaments_dataset.csv
    Number of Rows: 953
    Number of Columns: 8
    Approximate File Size: ~60 KB (will vary slightly based on exact content and line endings)
    

    Column Descriptor

    Here's a detailed description for each column in the marathi_ornaments_dataset.csv file:

    Ornament Class
    
      Description: The traditional Marathi name of the jewelry item. This is the categorical target variable representing different types of ornaments.
    
      Data Type: String (Categorical)
    
      Possible Values: Nath, Thushi, Kolhapuri Saaj, Mohan Mala, Laxmi Haar, Tanmani, Chinchpeti, Bakuli Haar, Surya Haar, Bugadi, Kudya, Bajuband, Tode, Patlya, Mangalsutra, Jodvi, Kambarpatta
    
    Weight (grams)
    
      Description: The approximate weight of the ornament in grams.
      Data Type: Float
      Units: grams (g)
      Range: Varies significantly by ornament type (e.g., Nath would be lighter, Laxmi Haar or Kambarpatta would be heavier).
    
    Length/Height (cm)
    
      Description: The approximate length (for necklaces, bracelets) or height (for earrings, nose rings) of the ornament in centimeters.
      Data Type: Float
      Units: centimeters (cm)
      Range: Varies by ornament type.
    
    Width (cm)
    
      Description: The approximate width of the ornament in centimeters.
      Data Type: Float
      Units: centimeters (cm)
      Range: Varies by ornament type and design.
    
    Number of Components/Units
    
      Description: The total count of distinct, often repeated, design elements or units that make up the ornament. For intricate necklaces, this can be high.
      Data Type: Integer
      Range: 1 to ~1000 (especially for fine 'Thushi' beads).
    Number of Stones/Pearls
    
      Description: The count of stones (e.g., diamonds, rubies, emeralds) or pearls embedded in or attached to the ornament.
      Data Type: Integer
      Range: 0 to ~50 (many traditional designs have no stones, some have many).
    
    Carat Weight of Stones
    
      Description: The total approximate carat weight of all stones present in the ornament. This value is 0.0 if Number of Stones/Pearls is 0.
      Data Type: Float
      Units: Carats (ct)
      Range: 0.0 to ~1.0 (or higher for very elaborate pieces).
    
    Gold Purity (Karat)
    
      Description: The purity of the primary gold material used, expressed in Karats. For 'Jodvi', which are traditionally silver, this represents silver purity as a percentage (even though labeled 'Gold Purity (Karat)' for consistency in column headers).
    
      Data Type: Integer
      Units: Karat (K) for gold, Percentage (%) for silver (for Jodvi).
      Possible Values: 20, 21, 22, 23, 24 for Gold. 80 to 95 for Silver (specifically for Jodvi).
    
    Primary Material
    
      Des...
    
  12. Coffee Shop Daily Revenue Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Coffee Shop Daily Revenue Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/himelsarder/coffee-shop-daily-revenue-prediction-dataset
    Explore at:
    zip(30259 bytes)Available download formats
    Dataset updated
    Feb 7, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Overview

    This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.

    Columns & Variables

    The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.

    1. Number of Customers Per Day

      • The total number of customers visiting the coffee shop on any given day.
      • Range: 50 - 500 customers.
    2. Average Order Value ($)

      • The average dollar amount spent by each customer during their visit.
      • Range: $2.50 - $10.00.
    3. Operating Hours Per Day

      • The total number of hours the coffee shop is open for business each day.
      • Range: 6 - 18 hours.
    4. Number of Employees

      • The number of employees working on a given day. This can influence service speed, customer satisfaction, and ultimately, sales.
      • Range: 2 - 15 employees.
    5. Marketing Spend Per Day ($)

      • The amount of money spent on marketing campaigns or promotions on any given day.
      • Range: $10 - $500 per day.
    6. Location Foot Traffic (people/hour)

      • The number of people passing by the coffee shop per hour, a variable indicative of the shop's location and its potential to attract customers.
      • Range: 50 - 1000 people per hour.

    Target Variable

    • Daily Revenue ($)
      • This is the dependent variable representing the total revenue generated by the coffee shop each day.
      • It is calculated as a combination of customer visits, average spending, and other operational factors like marketing spend and staff availability.
      • Range: $200 - $10,000 per day.

    Data Distribution & Insights

    The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:

    • The effect of marketing spend on daily revenue.
    • The correlation between customer count and daily sales.
    • The relationship between staffing levels and revenue generation.
    • The influence of foot traffic and operating hours on customer behavior.

    Use Cases & Applications

    The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:

    • Predictive Modeling: Use machine learning models such as regression, decision trees, or neural networks to predict daily revenue based on operational data.
    • Business Strategy Development: Analyze how changes in marketing spend, staff numbers, or operating hours can optimize revenue and improve efficiency.
    • Customer Insights: Identify patterns in customer behavior related to shop operations and external factors like foot traffic and marketing campaigns.
    • Resource Allocation: Determine optimal staffing levels and marketing budgets based on predicted sales, improving overall profitability.

    Real-World Applications in the Food & Beverage Industry

    For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:

    • Optimize Marketing Campaigns: Evaluate the effectiveness of daily or seasonal marketing campaigns on revenue.
    • Staff Scheduling: Predict busy days and ensure that the right number of employees are scheduled to maximize efficiency.
    • Revenue Forecasting: Provide accurate revenue projections that can assist with financial planning and decision-making.
    • Operational Efficiency: Discover the most profitable operating hours and adjust business hours accordingly.

    This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.

    Conclusion

    The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...

  13. f

    Median and (range) for FA, pennation angle, number of fibers, and fiber...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated May 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buck, Amanda K. W.; Damon, Bruce M.; Elder, Christopher P.; Ding, Zhaohua; Towse, Theodore F. (2015). Median and (range) for FA, pennation angle, number of fibers, and fiber length. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001943049
    Explore at:
    Dataset updated
    May 26, 2015
    Authors
    Buck, Amanda K. W.; Damon, Bruce M.; Elder, Christopher P.; Ding, Zhaohua; Towse, Theodore F.
    Description
    • indicates a statistical difference (p = 0.009) from unsmoothed (0%) data for the group;^ indicates a statistical difference (p = 0.0022) from unsmoothed (0%) data for the group;# indicates a statistical difference (p = 0.0043) from unsmoothed (0%) data for the group.Median and (range) for FA, pennation angle, number of fibers, and fiber length.
  14. N

    South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). South Range, MI Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis [Dataset]. https://www.neilsberg.com/research/datasets/63632866-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Sep 16, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Michigan, South Range
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for South Range, MI, is 16.9.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for South Range, MI, is 24.6.
    • Total dependency ratio for South Range, MI is 41.5.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for South Range, MI is 4.1.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the South Range for the selected age group is shown in the following column.
    • Population (Female): The female population in the South Range for the selected age group is shown in the following column.
    • Total Population: The total population of the South Range for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here

  15. Stirred Reactor CFD ML-Dataset

    • kaggle.com
    zip
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SimLab | AILab | SciSoftLab (2023). Stirred Reactor CFD ML-Dataset [Dataset]. https://www.kaggle.com/datasets/novalabs/stirred-reactor-cfd-ml-dataset
    Explore at:
    zip(34666 bytes)Available download formats
    Dataset updated
    Dec 18, 2023
    Authors
    SimLab | AILab | SciSoftLab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset consists of reactor features like reactor diameter, liquid height, number of immersed stirrers, number of blades, pitch angle, rpm etc. The response variables are the median values of turbulent kinetic energy (k), turbulent dissipation rate (epsilon), strain rate (strainRate) and velocity magnitute (Umag).

    The values were extracted from CFD simulations that were setup and run online with CliqScale.R, a stirred reactor online tool based on OpenFOAM (simpleFoam) and is developed and hosted by Novalabs AG in Zürich, Switzerland: https://novalabs.ch/cliqscaler/. CliqScale.R allows 3 mesh resolutions coarse, medium and fine. The current dataset has been build with coarse mesh. For a dataset with higher resolution please contact us through support.csr@novalabs.ch.

    Dataset configuration: - Volume range: 1L – 7000L - 3 volumes per reactor - 4 rpm per volume - 3 number of blades [2,3,4] - 2 pitch angles [45, 60] - 1 baffle configuration [4 baffles] - Fluid: water @ ~ 20°C

    Click "view more" for reactor image.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10105022%2F2c0fa3d909aacb2dbdd879276b46ce50%2Freactor.png?generation=1703064710518069&alt=media" alt="">

  16. Recognition of Handwritten Digits

    • kaggle.com
    zip
    Updated Dec 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hoang dang (2019). Recognition of Handwritten Digits [Dataset]. https://www.kaggle.com/hoandan/penbased-recognition-of-handwritten-digits
    Explore at:
    zip(225652 bytes)Available download formats
    Dataset updated
    Dec 10, 2019
    Authors
    hoang dang
    Description

    Source:

    E. Alpaydin, Fevzi. Alimoglu Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin '@' boun.edu.tr Data Set Information:

    We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by the other 14 are used for writer independent testing. This database is also available in the UNIPEN format.

    We use a WACOM PL-100V pressure sensitive tablet with an integrated LCD display and a cordless stylus. The input and display areas are located in the same place. Attached to the serial port of an Intel 486 based PC, it allows us to collect handwriting samples. The tablet sends $x$ and $y$ tablet coordinates and pressure level values of the pen at fixed time intervals (sampling rate) of 100 miliseconds.

    These writers are asked to write 250 digits in random order inside boxes of 500 by 500 tablet pixel resolution. Subject are monitored only during the first entry screens. Each screen contains five boxes with the digits to be written displayed above. Subjects are told to write only inside these boxes. If they make a mistake or are unhappy with their writing, they are instructed to clear the content of a box by using an on-screen button. The first ten digits are ignored because most writers are not familiar with this type of input devices, but subjects are not aware of this.

    In our study, we use only ($x, y$) coordinate information. The stylus pressure level values are ignored. First we apply normalization to make our representation invariant to translations and scale distortions. The raw data that we capture from the tablet consist of integer values between 0 and 500 (tablet input box resolution). The new coordinates are such that the coordinate which has the maximum range varies between 0 and 100. Usually $x$ stays in this range, since most characters are taller than they are wide.

    In order to train and test our classifiers, we need to represent digits as constant length feature vectors. A commonly used technique leading to good results is resampling the ( x_t, y_t) points. Temporal resampling (points regularly spaced in time) or spatial resampling (points regularly spaced in arc length) can be used here. Raw point data are already regularly spaced in time but the distance between them is variable. Previous research showed that spatial resampling to obtain a constant number of regularly spaced points on the trajectory yields much better performance, because it provides a better alignment between points. Our resampling algorithm uses simple linear interpolation between pairs of points. The resampled digits are represented as a sequence of T points ( x_t, y_t )_{t=1}^T, regularly spaced in arc length, as opposed to the input sequence, which is regularly spaced in time.

    So, the input vector size is 2*T, two times the number of points resampled. We considered spatial resampling to T=8,12,16 points in our experiments and found that T=8 gave the best trade-off between accuracy and complexity.

    Attribute Information:

    All input attributes are integers in the range 0..100. The last attribute is the class code 0..9

  17. Dataset of pictures of a single hand with different finger count

    • zenodo.org
    zip
    Updated Jun 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unknown; Unknown (2020). Dataset of pictures of a single hand with different finger count [Dataset]. http://doi.org/10.5281/zenodo.3901659
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Unknown; Unknown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of pictures of a single hand with different finger-count from zero to 5.

    The dataset is already divided in 3 sets:

    1. train: Training set.
    2. test: Testing test.
    3. val: Validation test.

    Images are tagged on how how many numbers they represent in a range from 0 to 5.

    This dataset is intended to be use for a university assignment: https://github.com/cabellocarlos/keras-cnn

  18. Data from: MNIST Handwritten Digits Dataset

    • kaggle.com
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghanshyam Saini (2025). MNIST Handwritten Digits Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/mnist-handwritten-digits-dataset/versions/1
    Explore at:
    zip(29605861 bytes)Available download formats
    Dataset updated
    May 15, 2025
    Authors
    Ghanshyam Saini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MNIST Handwritten Digits Dataset (Organized by Folder)

    This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).

    Dataset Structure:

    The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:

    • train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).

    • test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.

    Content of the Data:

    Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).

    How to Use This Dataset:

    1. Download the main MNIST folder (or the archive containing it) and extract its contents.
    2. Navigate into the mnist_png folder.
    3. The train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.
    4. The train folder provides the images you can use to train your machine learning models.
    5. The test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.

    Citation:

    The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.

    Data Contribution:

    Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.

    If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!

  19. N

    Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/grass-range-mt-population-by-age/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Grass Range, Montana
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the data for the Grass Range, MT population pyramid, which represents the Grass Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

    Key observations

    • Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for Grass Range, MT, is 17.1.
    • Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for Grass Range, MT, is 160.0.
    • Total dependency ratio for Grass Range, MT is 177.1.
    • Potential support ratio, which is the number of youth (working age population) per elderly, for Grass Range, MT is 0.6.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the Grass Range for the selected age group is shown in the following column.
    • Population (Female): The female population in the Grass Range for the selected age group is shown in the following column.
    • Total Population: The total population of the Grass Range for the selected age group is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Grass Range Population by Age. You can refer the same here

  20. 400k Augmented MNIST: Extended Handwritten Digits

    • kaggle.com
    zip
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Le Mercier (2025). 400k Augmented MNIST: Extended Handwritten Digits [Dataset]. https://www.kaggle.com/datasets/alexandrelemercier/400k-augmented-mnist-extended-handwritten-digits
    Explore at:
    zip(359213486 bytes)Available download formats
    Dataset updated
    Mar 26, 2025
    Authors
    Alexandre Le Mercier
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview

    The 400k Augmented MNIST dataset is an extended version of the classic MNIST handwritten digits dataset. By applying a variety of augmentation techniques, I have increased the number of training images to 400,000 - roughly 40,000 per digit label. This large and diverse training set is designed to significantly improve the robustness and generalization of models trained on it, making them less susceptible to overfitting and more resilient against adversarial perturbations.

    Dataset Structure

    The dataset is organized into two main directories:

    • Augmented MNIST Training Set (400k):
      This directory contains 10 subdirectories, one for each digit label ("Label 0" through "Label 9"). Each subdirectory holds the corresponding JPEG images generated via augmentation. These images have been produced using techniques such as random rotation, shear, translation, scaling, reflection, spatial padding, Ben Graham transformation, Gaussian noise, salt-and-pepper noise, and random text overlay.
    • MNIST Validation Set (4k):
      This directory also contains subdirectories "Label 0" to "Label 9". However, the validation set consists solely of the original MNIST images (approximately 400 per label) that were not used for augmentation. This allows you to evaluate model performance on natural, unaltered digit images, providing a clear benchmark for generalization.

    How to Use This Dataset

    1. Training:
      Use the augmented training set to train your deep learning models. The 400k images offer a wide variety of conditions, helping your model learn robust features that generalize well.
    2. Validation:
      Evaluate your models on the validation set, which contains only the original MNIST images. This will help you measure performance on “natural” digits, ensuring that improvements in robustness do not come at the expense of real-world accuracy.
    3. Flexibility:
      You can also experiment with mixed training (combining augmented and original images) to study how different training strategies affect model robustness and accuracy.

    Augmentation Techniques Applied

    The following augmentation functions were used to generate the extended dataset:

    • Random Rotation: Randomly rotates images within a specified angle range.
    • Random Shear: Applies slight shearing transformations.
    • Random Translation: Shifts images horizontally and vertically.
    • Random Scale: Zooms in or out on the images.
    • Ben Graham Transform: Enhances image contrast and clarity using a weighted Gaussian blur.
    • Random Gaussian Noise: Adds Gaussian noise to simulate sensor or environmental disturbances.
    • Random Salt-and-Pepper Noise: Introduces random pixel-level corruption.

    A random number of transformations (between 1 and 6, in a random order) is applied to each image, with the goal of creating a diverse and challenging training set.

    Citation

    If you use this dataset in your research, please cite it as follows:

    @misc{alexandre_le_mercier_2025,
      title={400k Augmented MNIST: Extended Handwritten Digits},
      url={https://www.kaggle.com/ds/6967763},
      DOI={10.34740/KAGGLE/DS/6967763},
      publisher={Kaggle},
      author={Alexandre Le Mercier},
      year={2025}
    }
    

    License

    This dataset is under the Apache 2.0 license.

    Contact

    For any questions or issues regarding this dataset, please send a message in the "Discussions" or "Suggestions" sections of the Kaggle dataset page.

    Good luck and happy coding! 🚀

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF

Large Dataset of Generalization Patterns in the Number Game

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2018
Dataset provided by
Harvard Dataverse
Authors
Eric J. Bigelow; Steven T. Piantadosi
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

Search
Clear search
Close search
Google apps
Main menu