100+ datasets found

H
Large Dataset of Generalization Patterns in the Number Game
dataverse.harvard.edu
search.dataone.org
Updated Aug 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/A8ZWLF
Dataset updated
Aug 10, 2018
Dataset provided by
Harvard Dataverse
Authors
Eric J. Bigelow; Steven T. Piantadosi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

MNIST-100

kaggle.com

zip

Updated Jul 25, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marcin Wierzbiński (2023). MNIST-100 [Dataset]. https://www.kaggle.com/datasets/martininf1n1ty/mnist100

Explore at:

zip(23452456 bytes)Available download formats

Dataset updated

Jul 25, 2023

Authors

Marcin Wierzbiński

License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

The MNIST-100 dataset is a variation of the original MNIST dataset, consisting of 100 handwritten numbers extracted from the MNIST dataset. Unlike the traditional MNIST dataset, which contains 60,000 training images of digits from 0 to 9, the Modified MNIST-10 dataset focuses on 100 numbers.

Dataset Overview: - Dataset Name: MNIST-100 - Total Number of Images: train: 60000 test: 1000 - Classes: 100 (Numbers from 00 to 99) - Image Size: 28x56 pixels (grayscale)

Data Collection: The MNIST-100 dataset was created by randomly selecting 10 unique digits from the original MNIST dataset. For each selected digit, 10 representative images were extracted, resulting in a total of 100 images. These images were carefully chosen to represent a diverse range of handwriting styles for each digit.

Each image in the dataset is labeled with its corresponding numbers, ranging from 00 to 99, making it suitable for classification tasks. Researchers and practitioners can use this dataset to train and evaluate machine learning algorithms and neural networks for digit recognition and classification.

Please note that the Modified MNIST-100 dataset is not intended to replace the original MNIST dataset but serves as a complementary resource for specific applications requiring a smaller and more focused subset of the MNIST data.

Overall, the MNIST-100 dataset offers a compact and representative collection of 100 handwritten numbers, providing a convenient tool for experimentation and learning in computer vision and pattern recognition.

Label Distribution for training set:

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655

Test data:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7193292%2Fac688f2526851734cb50be10f0a7bd7d%2Fpobrane%20(16).png?generation=1690276359580027&alt=media" alt="">

Label	Occurrences	Label	Occurrences	Label	Occurrences
00	96	34	100	68	90
01	108	35	91	69	92
02	91	36	107	70	102
03	96	37	112	71	116
04	75	38	97	72	101
05	85	39	96	73	106
06	88	40	103	74	98
07	96	41	123	75 ...

Landmarks Dataset for sign recognition numbers
kaggle.com
zip
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshat Mittu (2022). Landmarks Dataset for sign recognition numbers [Dataset]. https://www.kaggle.com/datasets/akshatmittu/landmarks-dataset-for-sign-recognition-numbers
Explore at:
zip(50385 bytes)Available download formats
Dataset updated
Nov 4, 2022
Authors
Akshat Mittu
Description
This dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).

You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly

import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os

for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:

for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks): mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks, connections = mp_hands.HAND_CONNECTIONS) a = dict() a['label'] = t for i in range(21): s = ('x','y','z') k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z) for j in range(len(k)): a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j] df = df.append(a,ignore_index=True)
r
Dataset for The effects of a number line intervention on calculation skills
researchdata.edu.au
figshare.mq.edu.au
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.V1
Explore at:
Unique identifier
https://doi.org/10.25949/22799717.V1
Dataset updated
May 18, 2023
Dataset provided by
Macquarie University
Authors
Saskia Kohnen; Rebecca Bull; Carola Ruiz Hornblas
Description

Study information

The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset.

All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders.

The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point.

The number of measurement points were distributed across participants as follows:

Participant 1 – 3 baseline, 6 treatment, 1 post-treatment

Participant 3 – 2 baseline, 7 treatment, 1 post-treatment

Participant 5 – 2 baseline, 5 treatment, 1 post-treatment

Participant 6 – 3 baseline, 4 treatment, 1 post-treatment

Participant 7 – 2 baseline, 5 treatment, 1 post-treatment

In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.

Measures

Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.

Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.

Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.

Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.

The Number Line Intervention

During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.

Variables in the dataset

Age = age in ‘years, months’ at the start of the study

Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents)

Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.

The second part of the variable name refers to the task, as follows:

DC = dot comparison

SDC = single-digit computation

NLE_UT = number line estimation (untrained set)

NLE_T= number line estimation (trained set)

CE = multidigit computational estimation

NC = number comparison

The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).

Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address...
catalog.data.gov
gimi9.com
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-hamilton-county-ne-address-range-feature
Explore at:
Dataset updated
Aug 7, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Hamilton County
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
Global Country Information Dataset 2023
kaggle.com
zip
Updated Jul 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
Explore at:
zip(24063 bytes)Available download formats
Dataset updated
Jul 8, 2023
Authors
Nidula Elgiriyewithana ⚡
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

Key Features

Country: Name of the country.

Density (P/Km2): Population density measured in persons per square kilometer.

Abbreviation: Abbreviation or code representing the country.

Agricultural Land (%): Percentage of land area used for agricultural purposes.

Land Area (Km2): Total land area of the country in square kilometers.

Armed Forces Size: Size of the armed forces in the country.

Birth Rate: Number of births per 1,000 population per year.

Calling Code: International calling code for the country.

Capital/Major City: Name of the capital or major city.

CO2 Emissions: Carbon dioxide emissions in tons.

CPI: Consumer Price Index, a measure of inflation and purchasing power.

CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

Currency_Code: Currency code used in the country.

Fertility Rate: Average number of children born to a woman during her lifetime.

Forested Area (%): Percentage of land area covered by forests.

Gasoline_Price: Price of gasoline per liter in local currency.

GDP: Gross Domestic Product, the total value of goods and services produced in the country.

Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

Largest City: Name of the country's largest city.

Life Expectancy: Average number of years a newborn is expected to live.

Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

Minimum Wage: Minimum wage level in local currency.

Official Language: Official language(s) spoken in the country.

Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

Physicians per Thousand: Number of physicians per thousand people.

Population: Total population of the country.

Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

Tax Revenue (%): Tax revenue as a percentage of GDP.

Total Tax Rate: Overall tax burden as a percentage of commercial profits.

Unemployment Rate: Percentage of the labor force that is unemployed.

Urban Population: Percentage of the population living in urban areas.

Latitude: Latitude coordinate of the country's location.

Longitude: Longitude coordinate of the country's location.

Potential Use Cases

Analyze population density and land area to study spatial distribution patterns.

Investigate the relationship between agricultural land and food security.

Examine carbon dioxide emissions and their impact on climate change.

Explore correlations between economic indicators such as GDP and various socio-economic factors.

Investigate educational enrollment rates and their implications for human capital development.

Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

Study labor market dynamics through indicators such as labor force participation and unemployment rates.

Investigate the role of taxation and its impact on economic development.

Explore urbanization trends and their social and environmental consequences.

Data Source: This dataset was compiled from multiple data sources

If this was helpful, a vote is appreciated ❤️ Thank you 🙂
Prime Number Source Code with Dataset
figshare.com
zip
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayman Mostafa (2024). Prime Number Source Code with Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27215508.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27215508.v1
Dataset updated
Oct 12, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ayman Mostafa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper addresses the computational methods and challenges associated with prime number generation, a critical component in encryption algorithms for ensuring data security. The generation of prime numbers efficiently is a critical challenge in various domains, including cryptography, number theory, and computer science. The quest to find more effective algorithms for prime number generation is driven by the increasing demand for secure communication and data storage and the need for efficient algorithms to solve complex mathematical problems. Our goal is to address this challenge by presenting two novel algorithms for generating prime numbers: one that generates primes up to a given limit and another that generates primes within a specified range. These innovative algorithms are founded on the formulas of odd-composed numbers, allowing them to achieve remarkable performance improvements compared to existing prime number generation algorithms. Our comprehensive experimental results reveal that our proposed algorithms outperform well-established prime number generation algorithms such as Miller-Rabin, Sieve of Atkin, Sieve of Eratosthenes, and Sieve of Sundaram regarding mean execution time. More notably, our algorithms exhibit the unique ability to provide prime numbers from range to range with a commendable performance. This substantial enhancement in performance and adaptability can significantly impact the effectiveness of various applications that depend on prime numbers, from cryptographic systems to distributed computing. By providing an efficient and flexible method for generating prime numbers, our proposed algorithms can develop more secure and reliable communication systems, enable faster computations in number theory, and support advanced computer science and mathematics research.
TIGER/Line Shapefile, Current, County, DuPage County, IL, Address...
catalog.data.gov
Updated Aug 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, DuPage County, IL, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-dupage-county-il-address-range-feature
Explore at:
Dataset updated
Aug 8, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
DuPage County, Illinois
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

HaDR: Dataset for hands instance segmentation

kaggle.com

zip

Updated Mar 7, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Ales Vysocky (2023). HaDR: Dataset for hands instance segmentation [Dataset]. https://www.kaggle.com/datasets/alevysock/hadr-dataset-for-hands-instance-segmentation

Explore at:

zip(10662295286 bytes)Available download formats

Dataset updated

Mar 7, 2023

Authors

Ales Vysocky

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.

S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826

The HaDR dataset is a multimodal dataset designed for human-robot gesture-based interaction research, consisting of RGB and Depth frames, with binary masks for each hand instance (i1, i2, single class data). The dataset is entirely synthetic, generated using Domain Randomization technique in CoppeliaSim 3D. The dataset can be used to train Deep Learning models to recognize hands using either a single modality (RGB or depth) or both simultaneously. The training-validation split comprises 95K and 22K samples, respectively, with annotations provided in COCO format. The instances are uniformly distributed across the image boundaries. The vision sensor captures depth and color images of the scene, with the depth pixel values scaled into a single channel 8-bit grayscale image in the range [0.2, 1.0] m. The following aspects of the scene were randomly varied during generation of dataset: • Number, colors, textures, scales and types of distractor objects selected from a set of 3D models of general tools and geometric primitives. A special type of distractor – an articulated dummy without hands (for instance-free samples) • Hand gestures (9 options). • Hand models’ positions and orientations. • Texture and surface properties (diffuse, specular and emissive properties) and number (from none to 2) of the object of interest, as well as its background. • Number and locations of directional lights sources (from 1 to 4), in addition to a planar light for ambient illumination. The sample resolution is set to 320×256, encoded in lossless PNG format, and contains only right hand meshes (we suggest using Flip augmentations during training), with a maximum of two instances per sample.

Test dataset (real camera images): Test dataset containing 706 images was captured using a real RGB-D camera (RealSense L515) in a cluttered and unstructured industrial environment. The dataset comprises various scenarios with diverse lighting conditions, backgrounds, obstacles, number of hands, and different types of work gloves (red, green, white, yellow, no gloves) with varying sleeve lengths. The dataset is assumed to have only one user, and the maximum number of hand instances per sample was limited to two. The dataset was manually labelled, and we provide hand instance segmentation COCO annotations in instances_hands_full.json (separately for train and val) and full arm instance annotations in instances_arms_full.json. The sample resolution was set to 640×480, and depth images were encoded in the same way as those of the synthetic dataset.

Channel-wise normalization and standardization parameters for datasets

Dataset	Mean (R, G, B, D)	STD (R, G, B, D)
Train	98.173, 95.456, 93.858, 55.872	67.539, 67.194, 67.796, 47.284
Validation	99.321, 97.284, 96.318, 58.189	67.814, 67.518, 67.576, 47.186
Test	123.675, 116.28, 103.53, 35.3792	58.395, 57.12, 57.375, 45.978

🏭 Predicting Manufacturing Defects Dataset
kaggle.com
zip
Updated Jun 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabie El Kharoua (2024). 🏭 Predicting Manufacturing Defects Dataset [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/predicting-manufacturing-defects-dataset
Explore at:
zip(371525 bytes)Available download formats
Dataset updated
Jun 17, 2024
Authors
Rabie El Kharoua
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

This dataset provides insights into factors influencing defect rates in a manufacturing environment. Each record represents various metrics crucial for predicting high or low defect occurrences in production processes.

Variables Description

Production Metrics

ProductionVolume: Number of units produced per day. - Data Type: Integer. - Range: 100 to 1000 units/day.

ProductionCost: Cost incurred for production per day. - Data Type: Float. - Range: $5000 to $20000.

Supply Chain and Logistics

SupplierQuality: Quality ratings of suppliers. - Data Type: Float (%). - Range: 80% to 100%.

DeliveryDelay: Average delay in delivery. - Data Type: Integer (days). - Range: 0 to 5 days.

Quality Control and Defect Rates

DefectRate: Defects per thousand units produced. - Data Type: Float. - Range: 0.5 to 5.0 defects.

QualityScore: Overall quality assessment. - Data Type: Float (%). - Range: 60% to 100%.

Maintenance and Downtime

MaintenanceHours: Hours spent on maintenance per week. - Data Type: Integer. - Range: 0 to 24 hours.

DowntimePercentage: Percentage of production downtime. - Data Type: Float (%). - Range: 0% to 5%.

Inventory Management

InventoryTurnover: Ratio of inventory turnover. - Data Type: Float. - Range: 2 to 10.

StockoutRate: Rate of inventory stockouts. - Data Type: Float (%). - Range: 0% to 10%.

Workforce Productivity and Safety

WorkerProductivity: Productivity level of the workforce. - Data Type: Float (%). - Range: 80% to 100%.

SafetyIncidents: Number of safety incidents per month. - Data Type: Integer. - Range: 0 to 10 incidents.

Energy Consumption and Efficiency

EnergyConsumption: Energy consumed in kWh. - Data Type: Float. - Range: 1000 to 5000 kWh.

EnergyEfficiency: Efficiency factor of energy usage. - Data Type: Float. - Range: 0.1 to 0.5.

Additive Manufacturing

AdditiveProcessTime: Time taken for additive manufacturing. - Data Type: Float (hours). - Range: 1 to 10 hours.

AdditiveMaterialCost: Cost of additive materials per unit. - Data Type: Float ($). - Range: $100 to $500.

Target Variable

DefectStatus: Predicted defect status. - Data Type: Binary (0 for Low Defects, 1 for High Defects).

Defect Instances

The dataset focuses on defect instances more because they do not occur often. However, non-defect instances were added too for this reason the dataset is imbalanced, consider balancing it before proceeding with machine learning techniques.

Data Conclusion

This dataset encompasses a comprehensive collection of metrics vital for predicting defect rates in manufacturing operations. It includes production volumes, supply chain quality, quality control assessments, maintenance schedules, inventory management details, workforce productivity metrics, energy consumption patterns, additive manufacturing specifics, and more.

Dataset Usage and Attribution Notice

This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

Exclusive Synthetic Dataset

This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.

Marathi and Maharashtrian Ornaments Dataset

kaggle.com

zip

Updated Jul 29, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Tushar Kute (2025). Marathi and Maharashtrian Ornaments Dataset [Dataset]. https://www.kaggle.com/datasets/tusharkute/marathi-and-maharashtrian-ornamants-dataset/code

Explore at:

zip(8971 bytes)Available download formats

Dataset updated

Jul 29, 2025

Authors

Tushar Kute

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset comprises 953 synthetically generated entries detailing various traditional Marathi ornaments. It is designed to provide a structured collection of common features associated with these unique pieces of jewelry, often worn in Maharashtra, India.

Purpose: The primary purpose of this dataset is to serve as a foundational resource for:

Educational Projects: Students and enthusiasts can use it to learn about data handling, analysis, and visualization.

Machine Learning Exploration: Researchers can explore classification or regression tasks, for instance, predicting the type of ornament based on its physical properties or vice-versa.

Jewelry Domain Studies: Individuals interested in traditional Indian jewelry can gain insights into the typical characteristics of these ornaments.

Data Generation Practice: It can serve as an example for understanding how synthetic datasets can be created for specific domains.

Content & Generation: The dataset was created programmatically by defining plausible ranges and distributions for each feature based on general knowledge of these ornaments. While synthetic, the values aim to reflect realistic characteristics for each ornament type, acknowledging that actual jewelry pieces will have unique variations. For example:

Weight, Length/Height, Width: Ranges were set to represent typical sizes and weights.
Number of Components/Units & Stones/Pearls: These features vary significantly based on the ornament's intricate design, from single-unit pieces like 'Nath' to multi-component necklaces like 'Thushi' or 'Mohan Mala'.
Carat Weight of Stones: Applied only to ornaments that typically feature stones or pearls.
Gold Purity: Reflects common gold purities used in Indian jewelry (e.g., 20K, 21K, 22K, 23K, 24K). Silver purity (e.g., 80-95%) is assigned for 'Jodvi'.
Primary Material: Predominantly 'Gold' for most ornaments, with 'Silver' for 'Jodvi'.

This dataset offers a starting point for analyses where real-world data might be scarce or difficult to collect.

File Information

File Name: marathi_ornaments_dataset.csv
Number of Rows: 953
Number of Columns: 8
Approximate File Size: ~60 KB (will vary slightly based on exact content and line endings)

Column Descriptor

Here's a detailed description for each column in the marathi_ornaments_dataset.csv file:

Ornament Class

  Description: The traditional Marathi name of the jewelry item. This is the categorical target variable representing different types of ornaments.

  Data Type: String (Categorical)

  Possible Values: Nath, Thushi, Kolhapuri Saaj, Mohan Mala, Laxmi Haar, Tanmani, Chinchpeti, Bakuli Haar, Surya Haar, Bugadi, Kudya, Bajuband, Tode, Patlya, Mangalsutra, Jodvi, Kambarpatta

Weight (grams)

  Description: The approximate weight of the ornament in grams.
  Data Type: Float
  Units: grams (g)
  Range: Varies significantly by ornament type (e.g., Nath would be lighter, Laxmi Haar or Kambarpatta would be heavier).

Length/Height (cm)

  Description: The approximate length (for necklaces, bracelets) or height (for earrings, nose rings) of the ornament in centimeters.
  Data Type: Float
  Units: centimeters (cm)
  Range: Varies by ornament type.

Width (cm)

  Description: The approximate width of the ornament in centimeters.
  Data Type: Float
  Units: centimeters (cm)
  Range: Varies by ornament type and design.

Number of Components/Units

  Description: The total count of distinct, often repeated, design elements or units that make up the ornament. For intricate necklaces, this can be high.
  Data Type: Integer
  Range: 1 to ~1000 (especially for fine 'Thushi' beads).
Number of Stones/Pearls

  Description: The count of stones (e.g., diamonds, rubies, emeralds) or pearls embedded in or attached to the ornament.
  Data Type: Integer
  Range: 0 to ~50 (many traditional designs have no stones, some have many).

Carat Weight of Stones

  Description: The total approximate carat weight of all stones present in the ornament. This value is 0.0 if Number of Stones/Pearls is 0.
  Data Type: Float
  Units: Carats (ct)
  Range: 0.0 to ~1.0 (or higher for very elaborate pieces).

Gold Purity (Karat)

  Description: The purity of the primary gold material used, expressed in Karats. For 'Jodvi', which are traditionally silver, this represents silver purity as a percentage (even though labeled 'Gold Purity (Karat)' for consistency in column headers).

  Data Type: Integer
  Units: Karat (K) for gold, Percentage (%) for silver (for Jodvi).
  Possible Values: 20, 21, 22, 23, 24 for Gold. 80 to 95 for Silver (specifically for Jodvi).

Primary Material

  Des...

Coffee Shop Daily Revenue Prediction Dataset
kaggle.com
zip
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Coffee Shop Daily Revenue Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/himelsarder/coffee-shop-daily-revenue-prediction-dataset
Explore at:
zip(30259 bytes)Available download formats
Dataset updated
Feb 7, 2025
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Overview

This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.

Columns & Variables

The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.

Number of Customers Per Day

The total number of customers visiting the coffee shop on any given day.

Range: 50 - 500 customers.

Average Order Value ($)

The average dollar amount spent by each customer during their visit.

Range: $2.50 - $10.00.

Operating Hours Per Day

The total number of hours the coffee shop is open for business each day.

Range: 6 - 18 hours.

Number of Employees

The number of employees working on a given day. This can influence service speed, customer satisfaction, and ultimately, sales.

Range: 2 - 15 employees.

Marketing Spend Per Day ($)

The amount of money spent on marketing campaigns or promotions on any given day.

Range: $10 - $500 per day.

Location Foot Traffic (people/hour)

The number of people passing by the coffee shop per hour, a variable indicative of the shop's location and its potential to attract customers.

Range: 50 - 1000 people per hour.

Target Variable

Daily Revenue ($)

This is the dependent variable representing the total revenue generated by the coffee shop each day.

It is calculated as a combination of customer visits, average spending, and other operational factors like marketing spend and staff availability.

Range: $200 - $10,000 per day.

Data Distribution & Insights

The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:

The effect of marketing spend on daily revenue.

The correlation between customer count and daily sales.

The relationship between staffing levels and revenue generation.

The influence of foot traffic and operating hours on customer behavior.

Use Cases & Applications

The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:

Predictive Modeling: Use machine learning models such as regression, decision trees, or neural networks to predict daily revenue based on operational data.

Business Strategy Development: Analyze how changes in marketing spend, staff numbers, or operating hours can optimize revenue and improve efficiency.

Customer Insights: Identify patterns in customer behavior related to shop operations and external factors like foot traffic and marketing campaigns.

Resource Allocation: Determine optimal staffing levels and marketing budgets based on predicted sales, improving overall profitability.

Real-World Applications in the Food & Beverage Industry

For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:

Optimize Marketing Campaigns: Evaluate the effectiveness of daily or seasonal marketing campaigns on revenue.

Staff Scheduling: Predict busy days and ensure that the right number of employees are scheduled to maximize efficiency.

Revenue Forecasting: Provide accurate revenue projections that can assist with financial planning and decision-making.

Operational Efficiency: Discover the most profitable operating hours and adjust business hours accordingly.

This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.

Conclusion

The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...
f
Median and (range) for FA, pennation angle, number of fibers, and fiber...
datasetcatalog.nlm.nih.gov
plos.figshare.com
+1more
Updated May 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buck, Amanda K. W.; Damon, Bruce M.; Elder, Christopher P.; Ding, Zhaohua; Towse, Theodore F. (2015). Median and (range) for FA, pennation angle, number of fibers, and fiber length. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001943049
Explore at:
Dataset updated
May 26, 2015
Authors
Buck, Amanda K. W.; Damon, Bruce M.; Elder, Christopher P.; Ding, Zhaohua; Towse, Theodore F.
Description
indicates a statistical difference (p = 0.009) from unsmoothed (0%) data for the group;^ indicates a statistical difference (p = 0.0022) from unsmoothed (0%) data for the group;# indicates a statistical difference (p = 0.0043) from unsmoothed (0%) data for the group.Median and (range) for FA, pennation angle, number of fibers, and fiber length.
N
South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...
neilsberg.com
csv, json
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). South Range, MI Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis [Dataset]. https://www.neilsberg.com/research/datasets/63632866-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
json, csvAvailable download formats
Dataset updated
Sep 16, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the data for the South Range, MI population pyramid, which represents the South Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

Key observations

Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for South Range, MI, is 16.9.

Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for South Range, MI, is 24.6.

Total dependency ratio for South Range, MI is 41.5.

Potential support ratio, which is the number of youth (working age population) per elderly, for South Range, MI is 4.1.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the South Range for the selected age group is shown in the following column.

Population (Female): The female population in the South Range for the selected age group is shown in the following column.

Total Population: The total population of the South Range for the selected age group is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here
Stirred Reactor CFD ML-Dataset
kaggle.com
zip
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SimLab | AILab | SciSoftLab (2023). Stirred Reactor CFD ML-Dataset [Dataset]. https://www.kaggle.com/datasets/novalabs/stirred-reactor-cfd-ml-dataset
Explore at:
zip(34666 bytes)Available download formats
Dataset updated
Dec 18, 2023
Authors
SimLab | AILab | SciSoftLab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset consists of reactor features like reactor diameter, liquid height, number of immersed stirrers, number of blades, pitch angle, rpm etc. The response variables are the median values of turbulent kinetic energy (k), turbulent dissipation rate (epsilon), strain rate (strainRate) and velocity magnitute (Umag).

The values were extracted from CFD simulations that were setup and run online with CliqScale.R, a stirred reactor online tool based on OpenFOAM (simpleFoam) and is developed and hosted by Novalabs AG in Zürich, Switzerland: https://novalabs.ch/cliqscaler/. CliqScale.R allows 3 mesh resolutions coarse, medium and fine. The current dataset has been build with coarse mesh. For a dataset with higher resolution please contact us through support.csr@novalabs.ch.

Dataset configuration: - Volume range: 1L – 7000L - 3 volumes per reactor - 4 rpm per volume - 3 number of blades [2,3,4] - 2 pitch angles [45, 60] - 1 baffle configuration [4 baffles] - Fluid: water @ ~ 20°C

Click "view more" for reactor image.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10105022%2F2c0fa3d909aacb2dbdd879276b46ce50%2Freactor.png?generation=1703064710518069&alt=media" alt="">
Recognition of Handwritten Digits
kaggle.com
zip
Updated Dec 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hoang dang (2019). Recognition of Handwritten Digits [Dataset]. https://www.kaggle.com/hoandan/penbased-recognition-of-handwritten-digits
Explore at:
zip(225652 bytes)Available download formats
Dataset updated
Dec 10, 2019
Authors
hoang dang
Description
Source:

E. Alpaydin, Fevzi. Alimoglu Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin '@' boun.edu.tr Data Set Information:

We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by the other 14 are used for writer independent testing. This database is also available in the UNIPEN format.

We use a WACOM PL-100V pressure sensitive tablet with an integrated LCD display and a cordless stylus. The input and display areas are located in the same place. Attached to the serial port of an Intel 486 based PC, it allows us to collect handwriting samples. The tablet sends $x$ and $y$ tablet coordinates and pressure level values of the pen at fixed time intervals (sampling rate) of 100 miliseconds.

These writers are asked to write 250 digits in random order inside boxes of 500 by 500 tablet pixel resolution. Subject are monitored only during the first entry screens. Each screen contains five boxes with the digits to be written displayed above. Subjects are told to write only inside these boxes. If they make a mistake or are unhappy with their writing, they are instructed to clear the content of a box by using an on-screen button. The first ten digits are ignored because most writers are not familiar with this type of input devices, but subjects are not aware of this.

In our study, we use only ($x, y$) coordinate information. The stylus pressure level values are ignored. First we apply normalization to make our representation invariant to translations and scale distortions. The raw data that we capture from the tablet consist of integer values between 0 and 500 (tablet input box resolution). The new coordinates are such that the coordinate which has the maximum range varies between 0 and 100. Usually $x$ stays in this range, since most characters are taller than they are wide.

In order to train and test our classifiers, we need to represent digits as constant length feature vectors. A commonly used technique leading to good results is resampling the ( x_t, y_t) points. Temporal resampling (points regularly spaced in time) or spatial resampling (points regularly spaced in arc length) can be used here. Raw point data are already regularly spaced in time but the distance between them is variable. Previous research showed that spatial resampling to obtain a constant number of regularly spaced points on the trajectory yields much better performance, because it provides a better alignment between points. Our resampling algorithm uses simple linear interpolation between pairs of points. The resampled digits are represented as a sequence of T points ( x_t, y_t )_{t=1}^T, regularly spaced in arc length, as opposed to the input sequence, which is regularly spaced in time.

So, the input vector size is 2*T, two times the number of points resampled. We considered spatial resampling to T=8,12,16 points in our experiments and found that T=8 gave the best trade-off between accuracy and complexity.

Attribute Information:

All input attributes are integers in the range 0..100. The last attribute is the class code 0..9
Dataset of pictures of a single hand with different finger count
zenodo.org
zip
Updated Jun 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unknown; Unknown (2020). Dataset of pictures of a single hand with different finger count [Dataset]. http://doi.org/10.5281/zenodo.3901659
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3901659
Dataset updated
Jun 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Unknown; Unknown
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of pictures of a single hand with different finger-count from zero to 5.

The dataset is already divided in 3 sets:

train: Training set.

test: Testing test.

val: Validation test.

Images are tagged on how how many numbers they represent in a range from 0 to 5.

This dataset is intended to be use for a university assignment: https://github.com/cabellocarlos/keras-cnn
Data from: MNIST Handwritten Digits Dataset
kaggle.com
zip
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghanshyam Saini (2025). MNIST Handwritten Digits Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/mnist-handwritten-digits-dataset/versions/1
Explore at:
zip(29605861 bytes)Available download formats
Dataset updated
May 15, 2025
Authors
Ghanshyam Saini
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MNIST Handwritten Digits Dataset (Organized by Folder)

This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).

Dataset Structure:

The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:

train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).

test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.

Content of the Data:

Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).

How to Use This Dataset:

Download the main MNIST folder (or the archive containing it) and extract its contents.

Navigate into the mnist_png folder.

The train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.

The train folder provides the images you can use to train your machine learning models.

The test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.

Citation:

The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.

Data Contribution:

Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.

If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!
N
Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/grass-range-mt-population-by-age/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Grass Range, Montana
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Total Population for Age Groups, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) male population, (b) female population and (b) total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the data for the Grass Range, MT population pyramid, which represents the Grass Range population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.

Key observations

Youth dependency ratio, which is the number of children aged 0-14 per 100 persons aged 15-64, for Grass Range, MT, is 17.1.

Old-age dependency ratio, which is the number of persons aged 65 or over per 100 persons aged 15-64, for Grass Range, MT, is 160.0.

Total dependency ratio for Grass Range, MT is 177.1.

Potential support ratio, which is the number of youth (working age population) per elderly, for Grass Range, MT is 0.6.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group for the Grass Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Grass Range for the selected age group is shown in the following column.

Population (Female): The female population in the Grass Range for the selected age group is shown in the following column.

Total Population: The total population of the Grass Range for the selected age group is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grass Range Population by Age. You can refer the same here
400k Augmented MNIST: Extended Handwritten Digits
kaggle.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Le Mercier (2025). 400k Augmented MNIST: Extended Handwritten Digits [Dataset]. https://www.kaggle.com/datasets/alexandrelemercier/400k-augmented-mnist-extended-handwritten-digits
Explore at:
zip(359213486 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Alexandre Le Mercier
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview

The 400k Augmented MNIST dataset is an extended version of the classic MNIST handwritten digits dataset. By applying a variety of augmentation techniques, I have increased the number of training images to 400,000 - roughly 40,000 per digit label. This large and diverse training set is designed to significantly improve the robustness and generalization of models trained on it, making them less susceptible to overfitting and more resilient against adversarial perturbations.

Dataset Structure

The dataset is organized into two main directories:

Augmented MNIST Training Set (400k):
This directory contains 10 subdirectories, one for each digit label ("Label 0" through "Label 9"). Each subdirectory holds the corresponding JPEG images generated via augmentation. These images have been produced using techniques such as random rotation, shear, translation, scaling, reflection, spatial padding, Ben Graham transformation, Gaussian noise, salt-and-pepper noise, and random text overlay.

MNIST Validation Set (4k):
This directory also contains subdirectories "Label 0" to "Label 9". However, the validation set consists solely of the original MNIST images (approximately 400 per label) that were not used for augmentation. This allows you to evaluate model performance on natural, unaltered digit images, providing a clear benchmark for generalization.

How to Use This Dataset

Training:
Use the augmented training set to train your deep learning models. The 400k images offer a wide variety of conditions, helping your model learn robust features that generalize well.

Validation:
Evaluate your models on the validation set, which contains only the original MNIST images. This will help you measure performance on “natural” digits, ensuring that improvements in robustness do not come at the expense of real-world accuracy.

Flexibility:
You can also experiment with mixed training (combining augmented and original images) to study how different training strategies affect model robustness and accuracy.

Augmentation Techniques Applied

The following augmentation functions were used to generate the extended dataset:

Random Rotation: Randomly rotates images within a specified angle range.

Random Shear: Applies slight shearing transformations.

Random Translation: Shifts images horizontally and vertically.

Random Scale: Zooms in or out on the images.

Ben Graham Transform: Enhances image contrast and clarity using a weighted Gaussian blur.

Random Gaussian Noise: Adds Gaussian noise to simulate sensor or environmental disturbances.

Random Salt-and-Pepper Noise: Introduces random pixel-level corruption.

A random number of transformations (between 1 and 6, in a random order) is applied to each image, with the goal of creating a diverse and challenging training set.

Citation

If you use this dataset in your research, please cite it as follows:

@misc{alexandre_le_mercier_2025, title={400k Augmented MNIST: Extended Handwritten Digits}, url={https://www.kaggle.com/ds/6967763}, DOI={10.34740/KAGGLE/DS/6967763}, publisher={Kaggle}, author={Alexandre Le Mercier}, year={2025} }

License

This dataset is under the Apache 2.0 license.

Contact

For any questions or issues regarding this dataset, please send a message in the "Discussions" or "Suggestions" sections of the Kaggle dataset page.

Good luck and happy coding! 🚀

Facebook

Twitter

Click to copy link

Link copied

Cite

Eric J. Bigelow; Steven T. Piantadosi (2018). Large Dataset of Generalization Patterns in the Number Game [Dataset]. http://doi.org/10.7910/DVN/A8ZWLF

Large Dataset of Generalization Patterns in the Number Game

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.7910/DVN/A8ZWLF

Dataset updated

Aug 10, 2018

Dataset provided by

Harvard Dataverse

Authors

Eric J. Bigelow; Steven T. Piantadosi

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

272,700 two-alternative forced choice responses in a simple numerical task modeled after Tenenbaum (1999, 2000), collected from 606 Amazon Mechanical Turk workers. Subjects were shown sets of numbers length 1 to 4 from the range 1 to 100 (e.g. {12, 16}), and asked what other numbers were likely to belong to that set (e.g. 1, 5, 2, 98). Their generalization patterns reflect both rule-like (e.g. “even numbers,” “powers of two”) and distance-based (e.g. numbers near 50) generalization. This data set is available for further analysis of these simple and intuitive inferences, developing of hands-on modeling instruction, and attempts to understand how probability and rules interact in human cognition.

Clear search

Close search

Google apps

Main menu

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655

Large Dataset of Generalization Patterns in the Number Game

MNIST-100

Landmarks Dataset for sign recognition numbers

Dataset for The effects of a number line intervention on calculation skills

Study information

Measures

The Number Line Intervention

Variables in the dataset

TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address...

Global Country Information Dataset 2023

Description

Key Features

Potential Use Cases

Prime Number Source Code with Dataset

TIGER/Line Shapefile, Current, County, DuPage County, IL, Address...

HaDR: Dataset for hands instance segmentation

🏭 Predicting Manufacturing Defects Dataset

Introduction

Variables Description

Production Metrics

Supply Chain and Logistics

Quality Control and Defect Rates

Maintenance and Downtime

Inventory Management

Workforce Productivity and Safety

Energy Consumption and Efficiency

Additive Manufacturing

Target Variable

Defect Instances

Data Conclusion

Dataset Usage and Attribution Notice

Exclusive Synthetic Dataset

Marathi and Maharashtrian Ornaments Dataset

Coffee Shop Daily Revenue Prediction Dataset

Dataset Overview

Columns & Variables

Target Variable

Data Distribution & Insights

Use Cases & Applications

Real-World Applications in the Food & Beverage Industry

Conclusion

Median and (range) for FA, pennation angle, number of fibers, and fiber...

South Range, MI Population Pyramid Dataset: Age Groups, Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Stirred Reactor CFD ML-Dataset

Recognition of Handwritten Digits

Dataset of pictures of a single hand with different finger count

Data from: MNIST Handwritten Digits Dataset

MNIST Handwritten Digits Dataset (Organized by Folder)

Grass Range, MT Population Pyramid Dataset: Age Groups, Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

400k Augmented MNIST: Extended Handwritten Digits

Overview

Dataset Structure

How to Use This Dataset

Augmentation Techniques Applied

Citation

License

Contact

Large Dataset of Generalization Patterns in the Number Game

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655