100+ datasets found

Mechanical Parts Dataset 2022
zenodo.org
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mübarek Mazhar Çakır; Mübarek Mazhar Çakır (2023). Mechanical Parts Dataset 2022 [Dataset]. http://doi.org/10.5281/zenodo.7504801
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7504801
Dataset updated
Jan 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mübarek Mazhar Çakır; Mübarek Mazhar Çakır
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mechanical Parts Dataset

The dataset consists of a total of 2250 images obtained by downloading from various internet platforms. Among the images in the dataset, there are 714 images with bearings, 632 images with bolts, 616 images with gears and 586 images with nuts. A total of 10597 manual labeling processes were carried out in the dataset, including 2099 labels belonging to the bearing class, 2734 labels belonging to the bolt class, 2662 labels belonging to the gear class and 3102 labels belonging to the nut class.

Folder Content

The created dataset is divided into 3 as 80% train, 10% validation and 10% test. In the "Mechanical Parts Dataset" folder, there are three separate folders as "train", "test" and "val". In each of these three folders there are folders named "images" and "labels". Images are kept in the "images" folder and tag information is kept in the "labels" folder.

Finally, inside the folder there is a yaml file named "mech_parts_data" for the Yolo algorithm. This file contains the number of classes and class names.

Images and Labels

The dataset was prepared in accordance with the Yolov5 algorithm.
For example, the tag information of the image named "2a0xhkr_jpg.rf.45a11bf63c40ad6e47da384fdf6bb7a1.jpg" is stored in the txt file with the same name. The tag information (coordinates) in the txt file are as follows: "class x_center y_center width height".

Update 05.01.2023

***Pascal voc and coco json formats have been added.***

Related paper: doi.org/10.5281/zenodo.7496767
i
Spare Parts Time Series Indian Dataset
ieee-dataport.org
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karthik Prakash (2025). Spare Parts Time Series Indian Dataset [Dataset]. https://ieee-dataport.org/documents/spare-parts-time-series-indian-dataset
Explore at:
Dataset updated
May 30, 2025
Authors
Karthik Prakash
Description
steel
g
50 Types of Car Parts -Image Classification
gts.ai
kaggle.com
json
Updated Mar 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). 50 Types of Car Parts -Image Classification [Dataset]. https://gts.ai/dataset-download/50-types-of-car-parts-image-classification/
Explore at:
jsonAvailable download formats
Dataset updated
Mar 20, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a dataset of images of 50 types of car parts. It includes a train set, a test set and a validation set. There are 50 classes of car parts...
m
Data from: Dataset for classifying English words into difficulty levels by...
data.mendeley.com
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nisar Kangoo (2023). Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students [Dataset]. http://doi.org/10.17632/p2wrs7hm4z.4
Explore at:
Unique identifier
https://doi.org/10.17632/p2wrs7hm4z.4
Dataset updated
Oct 24, 2023
Authors
Nisar Kangoo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains English words in column B. Corresponding to each word the other columns contain its frequency(fre), length(len), parts of speech(PS), the number of undergraduate students which marked it difficult (difficult_ug) and the number of postgraduate students which marked it difficult (difficult_pg).The dataset has a total of 5368 unique words. The words marked as difficult by undergraduate students are 680; and those marked as difficult by postgraduate students are 151; all the remaining words, viz., 4537, are easy and hence are not marked as difficult either by undergraduate and postgraduate students. The word against which there is hyphen (-) in difficult_ug column means that this word is not present in the text circulated to undergraduate students. Likewise hyphen(-) in difficult_pg column means words not present in text circulated to postgraduate students. The data is collected from the students of Jammu and Kashmir (a Union Territory of India). Latitude and Longitude (32.2778° N, 75.3412° E) The description of files attached is as: The dataset_english CSV file is the original dataset containing English words, its length, frequency, Parts of speech, number of undergraduate and postgraduate students which marked the particular words as difficult.
The dataset_numerical CSV file contains the original dataset along with string fields transformed into numerical. The English language difficulty level measurement -Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx files contains the questionnaire supplied to students of College and University to underline difficult words in the English text. IGNOU English.zip file contains the Indra Gandhi National Open University (IGNOU) English text books for graduation and post graduation students. The text for above questionnaires were taken from these IGNOU English text books.
Data from: EO4WildFires: An Earth Observation multi-sensor, time-series...
zenodo.org
zip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dimitris Sykas; Dimitris Sykas; Dimitris Zografakis; Dimitris Zografakis; Konstantinos Demestichas; Konstantinos Demestichas; Constantina Costopoulou; Pavlos Kosmidis; Constantina Costopoulou; Pavlos Kosmidis (2024). EO4WildFires: An Earth Observation multi-sensor, time-series machine-learning-ready benchmark dataset for wildfire impact prediction [Dataset]. http://doi.org/10.5281/zenodo.7762564
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7762564
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dimitris Sykas; Dimitris Sykas; Dimitris Zografakis; Dimitris Zografakis; Konstantinos Demestichas; Konstantinos Demestichas; Constantina Costopoulou; Pavlos Kosmidis; Constantina Costopoulou; Pavlos Kosmidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Earth
Description
This paper presents a benchmark dataset called EO4WildFires; a multi-sensor (multi spectral; Sentinel-2, Synthetic-Aperture Radar - SAR; Sentinel-1, meteorological parameters; NASA Power) time-series dataset that spans 45 countries, which can be used for developing machine learning and deep learning methods targeted for the estimation of the area that a forest wildfire might cover.

This novel EO4WildFires dataset is annotated using EFFIS (European Forest Fire Information System) as forest fire detection and size estimation data source. A total of 31,742 wildfire events are gathered from 2018 to 2022. For each event, Sentinel-2 (multispectral), Sentinel-1 (SAR) and meteorological data are assembled into a single data cube. The meteorological parameters that are included in the data cube are: ratio of actual partial pressure of water vapor to the partial pressure at saturation, average temperature, bias corrected average total precipitation, average wind speed, fraction of land covered by snowfall, percent of root zone soil wetness, snow depth, snow precipitation, as well as percent of soil moisture.

The main problem that this dataset is designed to address, is the severity forecasting before wildfires occur. The dataset is not used to predict wildfire events, but rather to predict the severity (size of area damaged by fire) of a wildfire event, if that happens in a specific place under the current and historical forest status, as recorded from multispectral and SAR images, and meteorological data.

Using the data cube for the collected wildfire events, the EO4WildFires dataset is used to realize three (3) different preliminary experiments, in order to evaluate the contributing factors for wildfire severity prediction. The first experiment evaluates wildfire size using only the meteorological parameters, the second one utilizes both the multispectral and SAR parts of the dataset, while the third exploits all dataset parts. In each experiment, machine learning models are developed, and their accuracy is evaluated.
d
Multi-Laboratory Hematoxylin and Eosin Staining Variance Supervised Machine...
search.dataone.org
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prezja, Fabi; Ilkka Pölönen; Sami Äyrämö; Pekka Ruusuvuori; Teijo Kuopio (2023). Multi-Laboratory Hematoxylin and Eosin Staining Variance Supervised Machine Learning Dataset [Dataset]. http://doi.org/10.7910/DVN/5YNF3B
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5YNF3B
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Prezja, Fabi; Ilkka Pölönen; Sami Äyrämö; Pekka Ruusuvuori; Teijo Kuopio
Description
We provide the generated dataset used for supervised machine learning in the related article. The data are in tabular format and contain all principal components and ground truth labels per tissue type. Tissue type codes used are; C1 for kidney, C2 for skin, and C3 for colon. 'PC' stands for the principal component. For feature extraction specifications, please see the original design in the related article. Features have been extracted independently for each tissue type.
DynaBench: A benchmark dataset for learning dynamical systems from...
zenodo.org
data.niaid.nih.gov
tar
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Dulny; Andrzej Dulny; Andreas Hotho; Andreas Hotho; Anna Krause; Anna Krause (2023). DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data (minimal) [Dataset]. http://doi.org/10.1007/978-3-031-43412-9_26
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.1007/978-3-031-43412-9_26
Dataset updated
Oct 31, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Dulny; Andrzej Dulny; Andreas Hotho; Andreas Hotho; Anna Krause; Anna Krause
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This is a minimal version of the DynaBench dataset, containing the first 5% of the data. The full dataset is available at https://professor-x.de/dynabench
Abstract:
Previous work on learning physical systems from data has focused on high-resolution grid-structured measurements. However, real-world knowledge of such systems (e.g. weather data) relies on sparsely scattered measuring stations. In this paper, we introduce a novel simulated benchmark dataset, DynaBench, for learning dynamical systems directly from sparsely scattered data without prior knowledge of the equations. The dataset focuses on predicting the evolution of a dynamical system from low-resolution, unstructured measurements. We simulate six different partial differential equations covering a variety of physical systems commonly used in the literature and evaluate several machine learning models, including traditional graph neural networks and point cloud processing models, with the task of predicting the evolution of the system. The proposed benchmark dataset is expected to advance the state of art as an out-of-the-box easy-to-use tool for evaluating models in a setting where only unstructured low-resolution observations are available. The benchmark is available at https://professor-x.de/dynabench.
Technical Info
The dataset is split into 42 parts (6 equations x 7 combinations of resolution/structure). Each part can be downloaded separately and contains 7000 simulations of the given equation at the given resolution and structure. The simulations are grouped into chunks of 500 simulations saved in the hdf5 file format. Each chunk contains the variable "data", where the values of the simulated system are stored, as well as the variable "points", where the coordinates at which the system has been observed are stored. For more details visit the DynaBench website at https://professor-x.de/dynabench/. The dataset is best used as part of the dynabench python package available at https://pypi.org/project/dynabench/.
d
USGS Contributions to the Nevada Geothermal Machine Learning Project...
catalog.data.gov
datasets.ai
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). USGS Contributions to the Nevada Geothermal Machine Learning Project (DE-FOA-0001956): Slip and Dilation Tendency Data [Dataset]. https://catalog.data.gov/dataset/usgs-contributions-to-the-nevada-geothermal-machine-learning-project-de-foa-0001956-slip-a
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Nevada
Description
This package contains data in a portion of northern Nevada, the extent of the ‘Nevada Machine Learning Project’ (DE-EE0008762). Slip tendency (TS) and dilation tendency (TD) were calculated for the all the faults in the Nevada ML study area. TS is the ratio between the shear components of the stress tensor and the normal components of the stress tensor acting on a fault plane. TD is the ratio of all the components of the stress tensor that are normal to a fault plane. Faults with higher TD are relatively more likely to dilate and host open, conductive fractures. Faults with higher TS are relatively more likely to slip, and these fractures may be propped open and conductive. These values of TS and TD were used to update a map surface from the Nevada Geothermal Machine Learning Project (DE-FOA-0001956) that used less reliable estimates for TS and TD. The new map surface was generated using the same procedure as the old surface, just with the new TS and TD data values.
d
Cinematic Dataset for AI-Generated Music (Machine Learning (ML) Data)
datarade.ai
.json, .csv, .xls
Updated Feb 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rightsify (2024). Cinematic Dataset for AI-Generated Music (Machine Learning (ML) Data) [Dataset]. https://datarade.ai/data-products/cinematic-dataset-for-ai-generated-music-machine-learning-m-rightsify
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Feb 10, 2024
Dataset authored and provided by
Rightsify
Area covered
Uzbekistan, Switzerland, Botswana, United Arab Emirates, Kuwait, State of, Chile, Cook Islands, Antigua and Barbuda, Malaysia
Description
Our Cinematic Dataset is a carefully selected collection of audio files with rich metadata, providing a wealth of information for machine learning applications such as generative AI music, Music Information Retrieval (MIR), and source separation. This dataset is specifically created to capture the rich and expressive quality of cinematic music, making it an ideal training environment for AI models. This dataset, which includes chords, instrumentation, key, tempo, and timestamps, is an invaluable resource for those looking to push AI's bounds in the field of audio innovation.

Strings, brass, woodwinds, and percussion are among the instruments used in the orchestral ensemble, which is a staple of film music. Strings, including violins, cellos, and double basses, are vital for communicating emotion, while brass instruments, such as trumpets and trombones, contribute to vastness and passion. Woodwinds, such as flutes and clarinets, give texture and nuance, while percussion instruments bring rhythm and impact. The careful arrangement of these parts produces distinct cinematic soundscapes, making the genre excellent for teaching AI models to recognize and duplicate complicated musical patterns.

Training models on this dataset provides a unique opportunity to explore the complexities of cinematic composition. The dataset's emphasis on important cinematic components, along with cinematic music's natural emotional storytelling ability, provides a solid platform for AI models to learn and compose music that captures the essence of engaging storylines. As AI continues to push creative boundaries, this Cinematic Music Dataset is a valuable tool for anybody looking to harness the compelling power of music in the digital environment.
i
Multispectral Dataset for parts of the coastal area of Gwadar
ieee-dataport.org
Updated Mar 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiaqiyu Zhan (2024). Multispectral Dataset for parts of the coastal area of Gwadar [Dataset]. https://ieee-dataport.org/documents/multispectral-dataset-parts-coastal-area-gwadar-pakistan
Explore at:
Dataset updated
Mar 19, 2024
Authors
Jiaqiyu Zhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Gwadar, Pakistan
Description
southwestern Pakistan) and its four regions of interest
R
Body Parts Detection Dataset
universe.roboflow.com
zip
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kishans Project (2025). Body Parts Detection Dataset [Dataset]. https://universe.roboflow.com/kishans-project/body-parts-detection-kqq6b/model/5
Explore at:
zipAvailable download formats
Dataset updated
Mar 28, 2025
Dataset authored and provided by
Kishans Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Body Parts Bounding Boxes
Description
Human Body Detection System Using Artificial Intelligence and Machine Learning Deep learning,Open CV, Python, and its Libraries. Basically its a Object Detection system But Enhancing it to Medical Industry.
Machine RUL Data
kaggle.com
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tetsuya Sasaki (2025). Machine RUL Data [Dataset]. https://www.kaggle.com/datasets/sasakitetsuya/machine-rul-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tetsuya Sasaki
Description
The dataset was created to simulate data related to the predictive maintenance of critical components in construction machinery, such as cranes, excavators, and bulldozers. It contains 1,000 records, each representing a unique component, with the following attributes:

1) Component_ID: A unique identifier for each component, formatted as CMP0001 to CMP1000. Component_Type: The type of component, categorized into three types: Engine, Hydraulic Cylinder, and Gear. 2) Vibration: A numerical feature indicating the vibration level of the component, measured in arbitrary units between 0.1 and 5.0. 3) Temperature: The operating temperature of the component, recorded in degrees Celsius within a range of 40 to 100. 4) Pressure: The pressure exerted on the component, measured in psi, ranging from 50 to 300. Operating_Hours: The total number of hours the component has been in operation, ranging from 0 to 5,000. 5) Remaining_Useful_Life (RUL): The estimated number of hours left before the component fails, randomly assigned within a range of 50 to 1,000.

This dataset aims to support the development and testing of machine learning models for Remaining Useful Life (RUL) prediction. It mimics real-world scenarios where sensor data is collected and analyzed to optimize maintenance schedules, reduce downtime, and improve operational efficiency. The features are designed to allow exploratory data analysis and advanced feature engineering for predictive maintenance tasks.
F
Data from: A Neural Approach for Text Extraction from Scholarly Figures
data.uni-hannover.de
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). A Neural Approach for Text Extraction from Scholarly Figures [Dataset]. https://data.uni-hannover.de/dataset/a-neural-approach-for-text-extraction-from-scholarly-figures
Explore at:
zip(798357692)Available download formats
Dataset updated
Jan 20, 2022
Dataset authored and provided by
TIB
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
A Neural Approach for Text Extraction from Scholarly Figures

This is the readme for the supplemental data for our ICDAR 2019 paper.

You can read our paper via IEEE here: https://ieeexplore.ieee.org/document/8978202

If you found this dataset useful, please consider citing our paper:

@inproceedings{DBLP:conf/icdar/MorrisTE19, author = {David Morris and Peichen Tang and Ralph Ewerth}, title = {A Neural Approach for Text Extraction from Scholarly Figures}, booktitle = {2019 International Conference on Document Analysis and Recognition, {ICDAR} 2019, Sydney, Australia, September 20-25, 2019}, pages = {1438--1443}, publisher = {{IEEE}}, year = {2019}, url = {https://doi.org/10.1109/ICDAR.2019.00231}, doi = {10.1109/ICDAR.2019.00231}, timestamp = {Tue, 04 Feb 2020 13:28:39 +0100}, biburl = {https://dblp.org/rec/conf/icdar/MorrisTE19.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

This work was financially supported by the German Federal Ministry of Education and Research (BMBF) and European Social Fund (ESF) (InclusiveOCW project, no. 01PE17004).

Datasets

We used different sources of data for testing, validation, and training. Our testing set was assembled by the work we cited by Böschen et al. We excluded the DeGruyter dataset, and use it as our validation dataset.

Testing

These datasets contain a readme with license information. Further information about the associated project can be found in the authors' published work we cited: https://doi.org/10.1007/978-3-319-51811-4_2

Validation

The DeGruyter dataset does not include the labeled images due to license restrictions. As of writing, the images can still be downloaded from DeGruyter via the links in the readme. Note that depending on what program you use to strip the images out of the PDF they are provided in, you may have to re-number the images.

Training

We used label_generator's generated dataset, which the author made available on a requester-pays amazon s3 bucket. We also used the Multi-Type Web Images dataset, which is mirrored here.

Code

We have made our code available in code.zip. We will upload code, announce further news, and field questions via the github repo.

Our text detection network is adapted from Argman's EAST implementation. The EAST/checkpoints/ours subdirectory contains the trained weights we used in the paper.

We used a tesseract script to run text extraction from detected text rows. This is inside our code code.tar as text_recognition_multipro.py.

We used a java script provided by Falk Böschen and adapted to our file structure. We included this as evaluator.jar.

Parameter sweeps are automated by param_sweep.rb. This file also shows how to invoke all of these components.
m
Data from: ElectroCom61: A Multiclass Dataset for Detection of Electronic...
data.mendeley.com
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Faiyaz Abdullah Sayeedi Faiyaz (2024). ElectroCom61: A Multiclass Dataset for Detection of Electronic Components [Dataset]. http://doi.org/10.17632/6scy6h8sjz.1
Explore at:
Unique identifier
https://doi.org/10.17632/6scy6h8sjz.1
Dataset updated
May 10, 2024
Authors
Md Faiyaz Abdullah Sayeedi Faiyaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The "ElectroCom61" dataset contains 2071 annotated images of electronic components sourced from the Electronic Lab Support Room, the United International University (UIU). This dataset was specifically designed to facilitate the development and validation of machine learning models for the real-time detection of electronic components. To mimic real-world scenarios and enhance the robustness of models trained on this data, images were captured under varied lighting conditions and against diverse backgrounds. Each electronic component was photographed from multiple angles, and following collection, images were standardized through auto-orientation and resized to 640x640 pixels, introducing some degree of stretching. The dataset is organized into 61 distinct classes of commonly used electronic components. The dataset were split into training (70%), validation (20%), and test (10%) sets.
r
Data from: Dataset Concerning the Process Monitoring and Condition...
researchdata.se
demo.researchdata.se
Updated Feb 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Ahmer (2024). Dataset Concerning the Process Monitoring and Condition Monitoring Data of a Bearing Ring Grinder [Dataset]. http://doi.org/10.5878/331q-3p13
Explore at:
(2292608068), (4763), (2287181724), (2718481588), (444145), (2274483682), (19803), (5846872141), (2293611666), (32229), (907), (2329474777)Available download formats
Unique identifier
https://doi.org/10.5878/331q-3p13
Dataset updated
Feb 5, 2024
Dataset provided by
Luleå University of Technology
Authors
Muhammad Ahmer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the article (Ahmer, M., Sandin, F., Marklund, P. et al., 2022), we have investigated the effective use of sensors in a bearing ring grinder for failure classification in the condition-based maintenance context. The proposed methodology combines domain knowledge of process monitoring and condition monitoring to successfully achieve failure mode prediction with high accuracy using only a few key sensors. This enables manufacturing equipment to take advantage of advanced data processing and machine learning techniques.

The grinding machine is of type SGB55 from Lidköping Machine Tools and is used to produce functional raceway surface of inner rings of type SKF-6210 deep groove ball bearing. Additional sensors like vibration, acoustic emission, force, and temperature sensors are installed to monitor machine condition while producing bearing components under different operating conditions. Data is sampled from sensors as well as the machine's numerical controller during operation. Selected parts are measured for the produced quality.

Ahmer, M., Sandin, F., Marklund, P., Gustafsson, M., & Berglund, K. (2022). Failure mode classification for condition-based maintenance in a bearing ring grinding machine. In The International Journal of Advanced Manufacturing Technology (Vol. 122, pp. 1479–1495). https://doi.org/10.1007/s00170-022-09930-6

The files are of three categories and are grouped in zipped folders. The pdf file named "readme_data_description.pdf" describes the content of the files in the folders. The "lib" includes the information on libraries to read the .tdms Data Files in Matlab or Python.

The raw time-domain sensors signal data are grouped in seven main folders named after each test run e.g. "test_1"... "test_7". Each test includes seven dressing cycles named e.g. "dresscyc_1"... "dresscyc_7". Each dressing cycle includes .tdms files for fifteen rings for their individual grinding cycle. The column description for both "Analogue" and "Digital" channels are described in the "readme_data_description.pdf" file. The machine and process parameters used for the tests as sampled from the machine's control system (Numerical Controller) and compiled for all test runs in a single file "process_data.csv" in the folder "proc_param". The column description is available in "readme_data_description.pdf" under "Process Parameters". The measured quality data (nine quality parameters - normalized) of the selected produced parts are recorded in the file "measured_quality_param.csv" under folder "quality". The description of the quality parameters is available in "readme_data_description.pdf". The quality parameter disposition based on their actual acceptance tolerances for the process step is presented in file "quality_disposition.csv" under folder "quality".
c
3D Kinect Total Body Database for Back Stretches
kilthub.cmu.edu
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blake Capella; Deepak Subramanian; Roberta Klatzky; Daniel Siewiorek (2023). 3D Kinect Total Body Database for Back Stretches [Dataset]. http://doi.org/10.1184/R1/7999364.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/7999364.v2
Dataset updated
May 30, 2023
Dataset provided by
Carnegie Mellon University
Authors
Blake Capella; Deepak Subramanian; Roberta Klatzky; Daniel Siewiorek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data was collected by a Kinect V2 as a set of X, Y, Z coordinates at 60 fps during 6 different yoga inspired back stretches. There are 541 files in the dataset, each containing position, velocity for 25 body joints. These joints include: Head, Neck, SpineShoulder, SpineMid, SpineBase, ShoulderRight, ShoulderLeft, HipRight, HipLeft, ElbowRight, WristRight, HandRight, HandTipRight, ThumbRight, ElbowLeft, WristLeft, HandLeft, HandTipLeft, ThumbLeft, KneeRight, AnkleRight, FootRight, KneeLeft, AnkleLeft, FootLeft. The program used to record this data was adapted from Thomas Sanchez Langeling’s skeleton recording code. The file was set to record data for each body part as a separate file, repeated for each exercise. Each bodypart for a specific exercise is stored in a distinct folder. These folders are named with the following convention: subjNumber_stretchName_trialNumber The subjNumber ranged from 0 – 8. The stretchName was one of the following: Mermaid, Seated, Sumo, Towel, Wall, Y. The trialNumber ranged from 0 – 9 and represented the repetition number. These coordinates were chosen to have an origin centered at the subject’s upper chest. The data was standardized to the following conditions: 1) Kinect placed at the height of 2 ft and 3 in 2) Subject consistently positioned 6.5 ft away from the camera with their chests facing the camera 3) Each participant completed 10 repetitions of each stretch before continuing on Data was collected from the following population: * Adults ages 18-21 * Females: 4 * Males: 5 The following types of pre-processing occurred at the time of data collection. Velocity Data: Calculated using a discrete derivative equation with a spacing of 5 frames chosen to reduce sensitivity of the velocity function v[n]=(x[n]-x[n-5])/5 Occurs for all body parts and all axes individually Related manuscript: Capella, B., Subrmanian, D., Klatzky, R., & Siewiorek, D. Action Pose Recognition from 3D Camera Data Using Inter-frame and Inter-joint Dependencies. Preprint at link in references.
Z
The ReDraw Dataset: A Set of Android Screenshots, GUI Metadata, and Labeled...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernal-Cardenas, Carlos (2020). The ReDraw Dataset: A Set of Android Screenshots, GUI Metadata, and Labeled Images of GUI Components [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2530276
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Poshyvanyk, Denys
Bonett, Richard
Moran, Kevin
Bernal-Cardenas, Carlos
Curcio, Michael
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used to train and evaluate the CNN and KNN machine learning techniques for the ReDraw paper, published in IEEE Transactions on Software Engineering in 2018.

Link to ReDraw Paper: https://arxiv.org/abs/1802.02312
The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...
zenodo.org
autovi.utc.fr
+1more
bin, txt, zip
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. http://doi.org/10.5281/zenodo.10459003
Explore at:
zip, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10459003
Dataset updated
Jun 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
See the official website: https://autovi.utc.fr

Modern industrial production lines must be set up with robust defect inspection modules that are able to withstand high product variability. This means that in a context of industrial production, new defects that are not yet known may appear, and must therefore be identified.

On industrial production lines, the typology of potential defects is vast (texture, part failure, logical defects, etc.). Inspection systems must therefore be able to detect non-listed defects, i.e. not-yet-observed defects upon the development of the inspection system. To solve this problem, research and development of unsupervised AI algorithms on real-world data is required.

Renault Group and the Université de technologie de Compiègne (Roberval and Heudiasyc Laboratories) have jointly developed the Automotive Visual Inspection Dataset (AutoVI), the purpose of which is to be used as a scientific benchmark to compare and develop advanced unsupervised anomaly detection algorithms under real production conditions. The images were acquired on Renault Group's automotive production lines, in a genuine industrial production line environment, with variations in brightness and lighting on constantly moving components. This dataset is representative of actual data acquisition conditions on automotive production lines.

The dataset contains 3950 images, split into 1530 training images and 2420 testing images.

The evaluation code can be found at https://github.com/phcarval/autovi_evaluation_code.

Disclaimer
All defects shown were intentionally created on Renault Group's production lines for the purpose of producing this dataset. The images were examined and labeled by Renault Group experts, and all defects were corrected after shooting.

License
Copyright © 2023-2024 Renault Group

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

For using the data in a way that falls under the commercial use clause of the license, please contact us.

Attribution
Please use the following for citing the dataset in scientific work:

Carvalho, P., Lafou, M., Durupt, A., Leblanc, A., & Grandvalet, Y. (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://doi.org/10.5281/zenodo.10459003

Contact
If you have any questions or remarks about this dataset, please contact us at philippe.carvalho@utc.fr, meriem.lafou@renault.com, alexandre.durupt@utc.fr, antoine.leblanc@renault.com, yves.grandvalet@utc.fr.

Changelog

v1.0.0

Cropped engine_wiring, pipe_clip and pipe_staple images

Reduced tank_screw, underbody_pipes and underbody_screw image sizes

v0.1.1

Added ground truth segmentation maps

Fixed categorization of some images

Added new defect categories

Removed tube_fastening and kitting_cart

Removed duplicates in pipe_clip
Stairs Image Dataset | Parts of House | Indoor
kaggle.com
Updated Sep 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCluster Labs (2022). Stairs Image Dataset | Parts of House | Indoor [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/stairs-image-dataset/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 25, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DataCluster Labs
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset is collected by DataCluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

This dataset is an extremely challenging set of over 3000+ originally Stair images captured and crowdsourced from over 500+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.

Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.

Dataset Features

Dataset size : 3000+

Captured by : Over 500+ crowdsource contributors

Resolution : 100% images HD and above (1920x1080 and above)

Location : Captured with 500+ cities accross India

Diversity : Various lighting conditions like day, night, varied distances, view points etc.

Device used : Captured using mobile phones in 2020-2022

Usage : Stair detection , Stair Edge detection , Computer Vision , etc.

Available Annotation formats

COCO, YOLO, PASCAL-VOC, Tf-Record

The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.
n
Malaria disease and grading system dataset from public hospitals reflecting...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie (2023). Malaria disease and grading system dataset from public hospitals reflecting complicated and uncomplicated conditions [Dataset]. http://doi.org/10.5061/dryad.4xgxd25gn
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4xgxd25gn
Dataset updated
Nov 10, 2023
Dataset provided by
Nasarawa State University
Authors
Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Malaria is the leading cause of death in the African region. Data mining can help extract valuable knowledge from available data in the healthcare sector. This makes it possible to train models to predict patient health faster than in clinical trials. Implementations of various machine learning algorithms such as K-Nearest Neighbors, Bayes Theorem, Logistic Regression, Support Vector Machines, and Multinomial Naïve Bayes (MNB), etc., has been applied to malaria datasets in public hospitals, but there are still limitations in modeling using the Naive Bayes multinomial algorithm. This study applies the MNB model to explore the relationship between 15 relevant attributes of public hospitals data. The goal is to examine how the dependency between attributes affects the performance of the classifier. MNB creates transparent and reliable graphical representation between attributes with the ability to predict new situations. The model (MNB) has 97% accuracy. It is concluded that this model outperforms the GNB classifier which has 100% accuracy and the RF which also has 100% accuracy. Methods Prior to collection of data, the researcher was be guided by all ethical training certification on data collection, right to confidentiality and privacy reserved called Institutional Review Board (IRB). Data was be collected from the manual archive of the Hospitals purposively selected using stratified sampling technique, transform the data to electronic form and store in MYSQL database called malaria. Each patient file was extracted and review for signs and symptoms of malaria then check for laboratory confirmation result from diagnosis. The data was be divided into two tables: the first table was called data1 which contain data for use in phase 1 of the classification, while the second table data2 which contains data for use in phase 2 of the classification. Data Source Collection Malaria incidence data set is obtained from Public hospitals from 2017 to 2021. These are the data used for modeling and analysis. Also, putting in mind the geographical location and socio-economic factors inclusive which are available for patients inhabiting those areas. Naive Bayes (Multinomial) is the model used to analyze the collected data for malaria disease prediction and grading accordingly. Data Preprocessing: Data preprocessing shall be done to remove noise and outlier. Transformation: The data shall be transformed from analog to electronic record. Data Partitioning The data which shall be collected will be divided into two portions; one portion of the data shall be extracted as a training set, while the other portion will be used for testing. The training portion shall be taken from a table stored in a database and will be called data which is training set1, while the training portion taking from another table store in a database is shall be called data which is training set2. The dataset was split into two parts: a sample containing 70% of the training data and 30% for the purpose of this research. Then, using MNB classification algorithms implemented in Python, the models were trained on the training sample. On the 30% remaining data, the resulting models were tested, and the results were compared with the other Machine Learning models using the standard metrics. Classification and prediction: Base on the nature of variable in the dataset, this study will use Naïve Bayes (Multinomial) classification techniques; Classification phase 1 and Classification phase 2. The operation of the framework is illustrated as follows: i. Data collection and preprocessing shall be done. ii. Preprocess data shall be stored in a training set 1 and training set 2. These datasets shall be used during classification. iii. Test data set is shall be stored in database test data set. iv. Part of the test data set must be compared for classification using classifier 1 and the remaining part must be classified with classifier 2 as follows: Classifier phase 1: It classify into positive or negative classes. If the patient is having malaria, then the patient is classified as positive (P), while a patient is classified as negative (N) if the patient does not have malaria.
Classifier phase 2: It classify only data set that has been classified as positive by classifier 1, and then further classify them into complicated and uncomplicated class label. The classifier will also capture data on environmental factors, genetics, gender and age, cultural and socio-economic variables. The system will be designed such that the core parameters as a determining factor should supply their value.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mübarek Mazhar Çakır; Mübarek Mazhar Çakır (2023). Mechanical Parts Dataset 2022 [Dataset]. http://doi.org/10.5281/zenodo.7504801

Mechanical Parts Dataset 2022

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.7504801

Dataset updated

Jan 5, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mübarek Mazhar Çakır; Mübarek Mazhar Çakır

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Mechanical Parts Dataset

The dataset consists of a total of 2250 images obtained by downloading from various internet platforms. Among the images in the dataset, there are 714 images with bearings, 632 images with bolts, 616 images with gears and 586 images with nuts. A total of 10597 manual labeling processes were carried out in the dataset, including 2099 labels belonging to the bearing class, 2734 labels belonging to the bolt class, 2662 labels belonging to the gear class and 3102 labels belonging to the nut class.

Folder Content

The created dataset is divided into 3 as 80% train, 10% validation and 10% test. In the "Mechanical Parts Dataset" folder, there are three separate folders as "train", "test" and "val". In each of these three folders there are folders named "images" and "labels". Images are kept in the "images" folder and tag information is kept in the "labels" folder.

Finally, inside the folder there is a yaml file named "mech_parts_data" for the Yolo algorithm. This file contains the number of classes and class names.

Images and Labels

The dataset was prepared in accordance with the Yolov5 algorithm.
For example, the tag information of the image named "2a0xhkr_jpg.rf.45a11bf63c40ad6e47da384fdf6bb7a1.jpg" is stored in the txt file with the same name. The tag information (coordinates) in the txt file are as follows: "class x_center y_center width height".

Update 05.01.2023

***Pascal voc and coco json formats have been added.***

Related paper: doi.org/10.5281/zenodo.7496767

Clear search

Close search

Google apps

Main menu

Mechanical Parts Dataset 2022

Spare Parts Time Series Indian Dataset

50 Types of Car Parts -Image Classification

Data from: Dataset for classifying English words into difficulty levels by...

Data from: EO4WildFires: An Earth Observation multi-sensor, time-series...

Multi-Laboratory Hematoxylin and Eosin Staining Variance Supervised Machine...

DynaBench: A benchmark dataset for learning dynamical systems from...

USGS Contributions to the Nevada Geothermal Machine Learning Project...

Cinematic Dataset for AI-Generated Music (Machine Learning (ML) Data)

Multispectral Dataset for parts of the coastal area of Gwadar

Body Parts Detection Dataset

Machine RUL Data

Data from: A Neural Approach for Text Extraction from Scholarly Figures

A Neural Approach for Text Extraction from Scholarly Figures

Datasets

Testing

Validation

Training

Code

Data from: ElectroCom61: A Multiclass Dataset for Detection of Electronic...

Data from: Dataset Concerning the Process Monitoring and Condition...

3D Kinect Total Body Database for Back Stretches

The ReDraw Dataset: A Set of Android Screenshots, GUI Metadata, and Labeled...

The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...

Stairs Image Dataset | Parts of House | Indoor

This dataset is collected by DataCluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

Dataset Features

Available Annotation formats

Malaria disease and grading system dataset from public hospitals reflecting...

Mechanical Parts Dataset 2022