100+ datasets found
  1. Mechanical Parts Dataset 2022

    • zenodo.org
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mübarek Mazhar Çakır; Mübarek Mazhar Çakır (2023). Mechanical Parts Dataset 2022 [Dataset]. http://doi.org/10.5281/zenodo.7504801
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mübarek Mazhar Çakır; Mübarek Mazhar Çakır
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mechanical Parts Dataset

    The dataset consists of a total of 2250 images obtained by downloading from various internet platforms. Among the images in the dataset, there are 714 images with bearings, 632 images with bolts, 616 images with gears and 586 images with nuts. A total of 10597 manual labeling processes were carried out in the dataset, including 2099 labels belonging to the bearing class, 2734 labels belonging to the bolt class, 2662 labels belonging to the gear class and 3102 labels belonging to the nut class.

    Folder Content

    The created dataset is divided into 3 as 80% train, 10% validation and 10% test. In the "Mechanical Parts Dataset" folder, there are three separate folders as "train", "test" and "val". In each of these three folders there are folders named "images" and "labels". Images are kept in the "images" folder and tag information is kept in the "labels" folder.

    Finally, inside the folder there is a yaml file named "mech_parts_data" for the Yolo algorithm. This file contains the number of classes and class names.

    Images and Labels

    The dataset was prepared in accordance with the Yolov5 algorithm.
    For example, the tag information of the image named "2a0xhkr_jpg.rf.45a11bf63c40ad6e47da384fdf6bb7a1.jpg" is stored in the txt file with the same name. The tag information (coordinates) in the txt file are as follows: "class x_center y_center width height".

    Update 05.01.2023

    ***Pascal voc and coco json formats have been added.***

    Related paper: doi.org/10.5281/zenodo.7496767

  2. i

    Spare Parts Time Series Indian Dataset

    • ieee-dataport.org
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karthik Prakash (2025). Spare Parts Time Series Indian Dataset [Dataset]. https://ieee-dataport.org/documents/spare-parts-time-series-indian-dataset
    Explore at:
    Dataset updated
    May 30, 2025
    Authors
    Karthik Prakash
    Description

    steel

  3. g

    50 Types of Car Parts -Image Classification

    • gts.ai
    • kaggle.com
    json
    Updated Mar 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). 50 Types of Car Parts -Image Classification [Dataset]. https://gts.ai/dataset-download/50-types-of-car-parts-image-classification/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Mar 20, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a dataset of images of 50 types of car parts. It includes a train set, a test set and a validation set. There are 50 classes of car parts...

  4. m

    Data from: Dataset for classifying English words into difficulty levels by...

    • data.mendeley.com
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nisar Kangoo (2023). Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students [Dataset]. http://doi.org/10.17632/p2wrs7hm4z.4
    Explore at:
    Dataset updated
    Oct 24, 2023
    Authors
    Nisar Kangoo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains English words in column B. Corresponding to each word the other columns contain its frequency(fre), length(len), parts of speech(PS), the number of undergraduate students which marked it difficult (difficult_ug) and the number of postgraduate students which marked it difficult (difficult_pg).The dataset has a total of 5368 unique words. The words marked as difficult by undergraduate students are 680; and those marked as difficult by postgraduate students are 151; all the remaining words, viz., 4537, are easy and hence are not marked as difficult either by undergraduate and postgraduate students. The word against which there is hyphen (-) in difficult_ug column means that this word is not present in the text circulated to undergraduate students. Likewise hyphen(-) in difficult_pg column means words not present in text circulated to postgraduate students. The data is collected from the students of Jammu and Kashmir (a Union Territory of India). Latitude and Longitude (32.2778° N, 75.3412° E) The description of files attached is as: The dataset_english CSV file is the original dataset containing English words, its length, frequency, Parts of speech, number of undergraduate and postgraduate students which marked the particular words as difficult.
    The dataset_numerical CSV file contains the original dataset along with string fields transformed into numerical. The English language difficulty level measurement -Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx files contains the questionnaire supplied to students of College and University to underline difficult words in the English text. IGNOU English.zip file contains the Indra Gandhi National Open University (IGNOU) English text books for graduation and post graduation students. The text for above questionnaires were taken from these IGNOU English text books.

  5. Data from: EO4WildFires: An Earth Observation multi-sensor, time-series...

    • zenodo.org
    zip
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitris Sykas; Dimitris Sykas; Dimitris Zografakis; Dimitris Zografakis; Konstantinos Demestichas; Konstantinos Demestichas; Constantina Costopoulou; Pavlos Kosmidis; Constantina Costopoulou; Pavlos Kosmidis (2024). EO4WildFires: An Earth Observation multi-sensor, time-series machine-learning-ready benchmark dataset for wildfire impact prediction [Dataset]. http://doi.org/10.5281/zenodo.7762564
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dimitris Sykas; Dimitris Sykas; Dimitris Zografakis; Dimitris Zografakis; Konstantinos Demestichas; Konstantinos Demestichas; Constantina Costopoulou; Pavlos Kosmidis; Constantina Costopoulou; Pavlos Kosmidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    This paper presents a benchmark dataset called EO4WildFires; a multi-sensor (multi spectral; Sentinel-2, Synthetic-Aperture Radar - SAR; Sentinel-1, meteorological parameters; NASA Power) time-series dataset that spans 45 countries, which can be used for developing machine learning and deep learning methods targeted for the estimation of the area that a forest wildfire might cover.

    This novel EO4WildFires dataset is annotated using EFFIS (European Forest Fire Information System) as forest fire detection and size estimation data source. A total of 31,742 wildfire events are gathered from 2018 to 2022. For each event, Sentinel-2 (multispectral), Sentinel-1 (SAR) and meteorological data are assembled into a single data cube. The meteorological parameters that are included in the data cube are: ratio of actual partial pressure of water vapor to the partial pressure at saturation, average temperature, bias corrected average total precipitation, average wind speed, fraction of land covered by snowfall, percent of root zone soil wetness, snow depth, snow precipitation, as well as percent of soil moisture.

    The main problem that this dataset is designed to address, is the severity forecasting before wildfires occur. The dataset is not used to predict wildfire events, but rather to predict the severity (size of area damaged by fire) of a wildfire event, if that happens in a specific place under the current and historical forest status, as recorded from multispectral and SAR images, and meteorological data.

    Using the data cube for the collected wildfire events, the EO4WildFires dataset is used to realize three (3) different preliminary experiments, in order to evaluate the contributing factors for wildfire severity prediction. The first experiment evaluates wildfire size using only the meteorological parameters, the second one utilizes both the multispectral and SAR parts of the dataset, while the third exploits all dataset parts. In each experiment, machine learning models are developed, and their accuracy is evaluated.

  6. d

    Multi-Laboratory Hematoxylin and Eosin Staining Variance Supervised Machine...

    • search.dataone.org
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prezja, Fabi; Ilkka Pölönen; Sami Äyrämö; Pekka Ruusuvuori; Teijo Kuopio (2023). Multi-Laboratory Hematoxylin and Eosin Staining Variance Supervised Machine Learning Dataset [Dataset]. http://doi.org/10.7910/DVN/5YNF3B
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Prezja, Fabi; Ilkka Pölönen; Sami Äyrämö; Pekka Ruusuvuori; Teijo Kuopio
    Description

    We provide the generated dataset used for supervised machine learning in the related article. The data are in tabular format and contain all principal components and ground truth labels per tissue type. Tissue type codes used are; C1 for kidney, C2 for skin, and C3 for colon. 'PC' stands for the principal component. For feature extraction specifications, please see the original design in the related article. Features have been extracted independently for each tissue type.

  7. DynaBench: A benchmark dataset for learning dynamical systems from...

    • zenodo.org
    • data.niaid.nih.gov
    tar
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrzej Dulny; Andrzej Dulny; Andreas Hotho; Andreas Hotho; Anna Krause; Anna Krause (2023). DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data (minimal) [Dataset]. http://doi.org/10.1007/978-3-031-43412-9_26
    Explore at:
    tarAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrzej Dulny; Andrzej Dulny; Andreas Hotho; Andreas Hotho; Anna Krause; Anna Krause
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a minimal version of the DynaBench dataset, containing the first 5% of the data. The full dataset is available at https://professor-x.de/dynabench

    Abstract:

    Previous work on learning physical systems from data has focused on high-resolution grid-structured measurements. However, real-world knowledge of such systems (e.g. weather data) relies on sparsely scattered measuring stations. In this paper, we introduce a novel simulated benchmark dataset, DynaBench, for learning dynamical systems directly from sparsely scattered data without prior knowledge of the equations. The dataset focuses on predicting the evolution of a dynamical system from low-resolution, unstructured measurements. We simulate six different partial differential equations covering a variety of physical systems commonly used in the literature and evaluate several machine learning models, including traditional graph neural networks and point cloud processing models, with the task of predicting the evolution of the system. The proposed benchmark dataset is expected to advance the state of art as an out-of-the-box easy-to-use tool for evaluating models in a setting where only unstructured low-resolution observations are available. The benchmark is available at https://professor-x.de/dynabench.

    Technical Info

    The dataset is split into 42 parts (6 equations x 7 combinations of resolution/structure). Each part can be downloaded separately and contains 7000 simulations of the given equation at the given resolution and structure. The simulations are grouped into chunks of 500 simulations saved in the hdf5 file format. Each chunk contains the variable "data", where the values of the simulated system are stored, as well as the variable "points", where the coordinates at which the system has been observed are stored. For more details visit the DynaBench website at https://professor-x.de/dynabench/. The dataset is best used as part of the dynabench python package available at https://pypi.org/project/dynabench/.

  8. d

    USGS Contributions to the Nevada Geothermal Machine Learning Project...

    • catalog.data.gov
    • datasets.ai
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). USGS Contributions to the Nevada Geothermal Machine Learning Project (DE-FOA-0001956): Slip and Dilation Tendency Data [Dataset]. https://catalog.data.gov/dataset/usgs-contributions-to-the-nevada-geothermal-machine-learning-project-de-foa-0001956-slip-a
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Nevada
    Description

    This package contains data in a portion of northern Nevada, the extent of the ‘Nevada Machine Learning Project’ (DE-EE0008762). Slip tendency (TS) and dilation tendency (TD) were calculated for the all the faults in the Nevada ML study area. TS is the ratio between the shear components of the stress tensor and the normal components of the stress tensor acting on a fault plane. TD is the ratio of all the components of the stress tensor that are normal to a fault plane. Faults with higher TD are relatively more likely to dilate and host open, conductive fractures. Faults with higher TS are relatively more likely to slip, and these fractures may be propped open and conductive. These values of TS and TD were used to update a map surface from the Nevada Geothermal Machine Learning Project (DE-FOA-0001956) that used less reliable estimates for TS and TD. The new map surface was generated using the same procedure as the old surface, just with the new TS and TD data values.

  9. d

    Cinematic Dataset for AI-Generated Music (Machine Learning (ML) Data)

    • datarade.ai
    .json, .csv, .xls
    Updated Feb 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rightsify (2024). Cinematic Dataset for AI-Generated Music (Machine Learning (ML) Data) [Dataset]. https://datarade.ai/data-products/cinematic-dataset-for-ai-generated-music-machine-learning-m-rightsify
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 10, 2024
    Dataset authored and provided by
    Rightsify
    Area covered
    Uzbekistan, Switzerland, Botswana, United Arab Emirates, Kuwait, State of, Chile, Cook Islands, Antigua and Barbuda, Malaysia
    Description

    Our Cinematic Dataset is a carefully selected collection of audio files with rich metadata, providing a wealth of information for machine learning applications such as generative AI music, Music Information Retrieval (MIR), and source separation. This dataset is specifically created to capture the rich and expressive quality of cinematic music, making it an ideal training environment for AI models. This dataset, which includes chords, instrumentation, key, tempo, and timestamps, is an invaluable resource for those looking to push AI's bounds in the field of audio innovation.

    Strings, brass, woodwinds, and percussion are among the instruments used in the orchestral ensemble, which is a staple of film music. Strings, including violins, cellos, and double basses, are vital for communicating emotion, while brass instruments, such as trumpets and trombones, contribute to vastness and passion. Woodwinds, such as flutes and clarinets, give texture and nuance, while percussion instruments bring rhythm and impact. The careful arrangement of these parts produces distinct cinematic soundscapes, making the genre excellent for teaching AI models to recognize and duplicate complicated musical patterns.

    Training models on this dataset provides a unique opportunity to explore the complexities of cinematic composition. The dataset's emphasis on important cinematic components, along with cinematic music's natural emotional storytelling ability, provides a solid platform for AI models to learn and compose music that captures the essence of engaging storylines. As AI continues to push creative boundaries, this Cinematic Music Dataset is a valuable tool for anybody looking to harness the compelling power of music in the digital environment.

  10. i

    Multispectral Dataset for parts of the coastal area of Gwadar

    • ieee-dataport.org
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiaqiyu Zhan (2024). Multispectral Dataset for parts of the coastal area of Gwadar [Dataset]. https://ieee-dataport.org/documents/multispectral-dataset-parts-coastal-area-gwadar-pakistan
    Explore at:
    Dataset updated
    Mar 19, 2024
    Authors
    Jiaqiyu Zhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Gwadar, Pakistan
    Description

    southwestern Pakistan) and its four regions of interest

  11. R

    Body Parts Detection Dataset

    • universe.roboflow.com
    zip
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kishans Project (2025). Body Parts Detection Dataset [Dataset]. https://universe.roboflow.com/kishans-project/body-parts-detection-kqq6b/model/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 28, 2025
    Dataset authored and provided by
    Kishans Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Body Parts Bounding Boxes
    Description

    Human Body Detection System Using Artificial Intelligence and Machine Learning Deep learning,Open CV, Python, and its Libraries. Basically its a Object Detection system But Enhancing it to Medical Industry.

  12. Machine RUL Data

    • kaggle.com
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tetsuya Sasaki (2025). Machine RUL Data [Dataset]. https://www.kaggle.com/datasets/sasakitetsuya/machine-rul-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tetsuya Sasaki
    Description

    The dataset was created to simulate data related to the predictive maintenance of critical components in construction machinery, such as cranes, excavators, and bulldozers. It contains 1,000 records, each representing a unique component, with the following attributes:

    1) Component_ID: A unique identifier for each component, formatted as CMP0001 to CMP1000. Component_Type: The type of component, categorized into three types: Engine, Hydraulic Cylinder, and Gear. 2) Vibration: A numerical feature indicating the vibration level of the component, measured in arbitrary units between 0.1 and 5.0. 3) Temperature: The operating temperature of the component, recorded in degrees Celsius within a range of 40 to 100. 4) Pressure: The pressure exerted on the component, measured in psi, ranging from 50 to 300. Operating_Hours: The total number of hours the component has been in operation, ranging from 0 to 5,000. 5) Remaining_Useful_Life (RUL): The estimated number of hours left before the component fails, randomly assigned within a range of 50 to 1,000.

    This dataset aims to support the development and testing of machine learning models for Remaining Useful Life (RUL) prediction. It mimics real-world scenarios where sensor data is collected and analyzed to optimize maintenance schedules, reduce downtime, and improve operational efficiency. The features are designed to allow exploratory data analysis and advanced feature engineering for predictive maintenance tasks.

  13. F

    Data from: A Neural Approach for Text Extraction from Scholarly Figures

    • data.uni-hannover.de
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIB (2022). A Neural Approach for Text Extraction from Scholarly Figures [Dataset]. https://data.uni-hannover.de/dataset/a-neural-approach-for-text-extraction-from-scholarly-figures
    Explore at:
    zip(798357692)Available download formats
    Dataset updated
    Jan 20, 2022
    Dataset authored and provided by
    TIB
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    A Neural Approach for Text Extraction from Scholarly Figures

    This is the readme for the supplemental data for our ICDAR 2019 paper.

    You can read our paper via IEEE here: https://ieeexplore.ieee.org/document/8978202

    If you found this dataset useful, please consider citing our paper:

    @inproceedings{DBLP:conf/icdar/MorrisTE19,
     author  = {David Morris and
            Peichen Tang and
            Ralph Ewerth},
     title   = {A Neural Approach for Text Extraction from Scholarly Figures},
     booktitle = {2019 International Conference on Document Analysis and Recognition,
            {ICDAR} 2019, Sydney, Australia, September 20-25, 2019},
     pages   = {1438--1443},
     publisher = {{IEEE}},
     year   = {2019},
     url    = {https://doi.org/10.1109/ICDAR.2019.00231},
     doi    = {10.1109/ICDAR.2019.00231},
     timestamp = {Tue, 04 Feb 2020 13:28:39 +0100},
     biburl  = {https://dblp.org/rec/conf/icdar/MorrisTE19.bib},
     bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    

    This work was financially supported by the German Federal Ministry of Education and Research (BMBF) and European Social Fund (ESF) (InclusiveOCW project, no. 01PE17004).

    Datasets

    We used different sources of data for testing, validation, and training. Our testing set was assembled by the work we cited by Böschen et al. We excluded the DeGruyter dataset, and use it as our validation dataset.

    Testing

    These datasets contain a readme with license information. Further information about the associated project can be found in the authors' published work we cited: https://doi.org/10.1007/978-3-319-51811-4_2

    Validation

    The DeGruyter dataset does not include the labeled images due to license restrictions. As of writing, the images can still be downloaded from DeGruyter via the links in the readme. Note that depending on what program you use to strip the images out of the PDF they are provided in, you may have to re-number the images.

    Training

    We used label_generator's generated dataset, which the author made available on a requester-pays amazon s3 bucket. We also used the Multi-Type Web Images dataset, which is mirrored here.

    Code

    We have made our code available in code.zip. We will upload code, announce further news, and field questions via the github repo.

    Our text detection network is adapted from Argman's EAST implementation. The EAST/checkpoints/ours subdirectory contains the trained weights we used in the paper.

    We used a tesseract script to run text extraction from detected text rows. This is inside our code code.tar as text_recognition_multipro.py.

    We used a java script provided by Falk Böschen and adapted to our file structure. We included this as evaluator.jar.

    Parameter sweeps are automated by param_sweep.rb. This file also shows how to invoke all of these components.

  14. m

    Data from: ElectroCom61: A Multiclass Dataset for Detection of Electronic...

    • data.mendeley.com
    Updated May 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Faiyaz Abdullah Sayeedi Faiyaz (2024). ElectroCom61: A Multiclass Dataset for Detection of Electronic Components [Dataset]. http://doi.org/10.17632/6scy6h8sjz.1
    Explore at:
    Dataset updated
    May 10, 2024
    Authors
    Md Faiyaz Abdullah Sayeedi Faiyaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The "ElectroCom61" dataset contains 2071 annotated images of electronic components sourced from the Electronic Lab Support Room, the United International University (UIU). This dataset was specifically designed to facilitate the development and validation of machine learning models for the real-time detection of electronic components. To mimic real-world scenarios and enhance the robustness of models trained on this data, images were captured under varied lighting conditions and against diverse backgrounds. Each electronic component was photographed from multiple angles, and following collection, images were standardized through auto-orientation and resized to 640x640 pixels, introducing some degree of stretching. The dataset is organized into 61 distinct classes of commonly used electronic components. The dataset were split into training (70%), validation (20%), and test (10%) sets.

  15. r

    Data from: Dataset Concerning the Process Monitoring and Condition...

    • researchdata.se
    • demo.researchdata.se
    Updated Feb 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Ahmer (2024). Dataset Concerning the Process Monitoring and Condition Monitoring Data of a Bearing Ring Grinder [Dataset]. http://doi.org/10.5878/331q-3p13
    Explore at:
    (2292608068), (4763), (2287181724), (2718481588), (444145), (2274483682), (19803), (5846872141), (2293611666), (32229), (907), (2329474777)Available download formats
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    Luleå University of Technology
    Authors
    Muhammad Ahmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the article (Ahmer, M., Sandin, F., Marklund, P. et al., 2022), we have investigated the effective use of sensors in a bearing ring grinder for failure classification in the condition-based maintenance context. The proposed methodology combines domain knowledge of process monitoring and condition monitoring to successfully achieve failure mode prediction with high accuracy using only a few key sensors. This enables manufacturing equipment to take advantage of advanced data processing and machine learning techniques.

    The grinding machine is of type SGB55 from Lidköping Machine Tools and is used to produce functional raceway surface of inner rings of type SKF-6210 deep groove ball bearing. Additional sensors like vibration, acoustic emission, force, and temperature sensors are installed to monitor machine condition while producing bearing components under different operating conditions. Data is sampled from sensors as well as the machine's numerical controller during operation. Selected parts are measured for the produced quality.

    Ahmer, M., Sandin, F., Marklund, P., Gustafsson, M., & Berglund, K. (2022). Failure mode classification for condition-based maintenance in a bearing ring grinding machine. In The International Journal of Advanced Manufacturing Technology (Vol. 122, pp. 1479–1495). https://doi.org/10.1007/s00170-022-09930-6

    The files are of three categories and are grouped in zipped folders. The pdf file named "readme_data_description.pdf" describes the content of the files in the folders. The "lib" includes the information on libraries to read the .tdms Data Files in Matlab or Python.

    The raw time-domain sensors signal data are grouped in seven main folders named after each test run e.g. "test_1"... "test_7". Each test includes seven dressing cycles named e.g. "dresscyc_1"... "dresscyc_7". Each dressing cycle includes .tdms files for fifteen rings for their individual grinding cycle. The column description for both "Analogue" and "Digital" channels are described in the "readme_data_description.pdf" file. The machine and process parameters used for the tests as sampled from the machine's control system (Numerical Controller) and compiled for all test runs in a single file "process_data.csv" in the folder "proc_param". The column description is available in "readme_data_description.pdf" under "Process Parameters". The measured quality data (nine quality parameters - normalized) of the selected produced parts are recorded in the file "measured_quality_param.csv" under folder "quality". The description of the quality parameters is available in "readme_data_description.pdf". The quality parameter disposition based on their actual acceptance tolerances for the process step is presented in file "quality_disposition.csv" under folder "quality".

  16. c

    3D Kinect Total Body Database for Back Stretches

    • kilthub.cmu.edu
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blake Capella; Deepak Subramanian; Roberta Klatzky; Daniel Siewiorek (2023). 3D Kinect Total Body Database for Back Stretches [Dataset]. http://doi.org/10.1184/R1/7999364.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Carnegie Mellon University
    Authors
    Blake Capella; Deepak Subramanian; Roberta Klatzky; Daniel Siewiorek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data was collected by a Kinect V2 as a set of X, Y, Z coordinates at 60 fps during 6 different yoga inspired back stretches. There are 541 files in the dataset, each containing position, velocity for 25 body joints. These joints include: Head, Neck, SpineShoulder, SpineMid, SpineBase, ShoulderRight, ShoulderLeft, HipRight, HipLeft, ElbowRight, WristRight, HandRight, HandTipRight, ThumbRight, ElbowLeft, WristLeft, HandLeft, HandTipLeft, ThumbLeft, KneeRight, AnkleRight, FootRight, KneeLeft, AnkleLeft, FootLeft. The program used to record this data was adapted from Thomas Sanchez Langeling’s skeleton recording code. The file was set to record data for each body part as a separate file, repeated for each exercise. Each bodypart for a specific exercise is stored in a distinct folder. These folders are named with the following convention: subjNumber_stretchName_trialNumber The subjNumber ranged from 0 – 8. The stretchName was one of the following: Mermaid, Seated, Sumo, Towel, Wall, Y. The trialNumber ranged from 0 – 9 and represented the repetition number. These coordinates were chosen to have an origin centered at the subject’s upper chest. The data was standardized to the following conditions: 1) Kinect placed at the height of 2 ft and 3 in 2) Subject consistently positioned 6.5 ft away from the camera with their chests facing the camera 3) Each participant completed 10 repetitions of each stretch before continuing on Data was collected from the following population: * Adults ages 18-21 * Females: 4 * Males: 5 The following types of pre-processing occurred at the time of data collection. Velocity Data: Calculated using a discrete derivative equation with a spacing of 5 frames chosen to reduce sensitivity of the velocity function v[n]=(x[n]-x[n-5])/5 Occurs for all body parts and all axes individually Related manuscript: Capella, B., Subrmanian, D., Klatzky, R., & Siewiorek, D. Action Pose Recognition from 3D Camera Data Using Inter-frame and Inter-joint Dependencies. Preprint at link in references.

  17. Z

    The ReDraw Dataset: A Set of Android Screenshots, GUI Metadata, and Labeled...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernal-Cardenas, Carlos (2020). The ReDraw Dataset: A Set of Android Screenshots, GUI Metadata, and Labeled Images of GUI Components [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2530276
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Poshyvanyk, Denys
    Bonett, Richard
    Moran, Kevin
    Bernal-Cardenas, Carlos
    Curcio, Michael
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used to train and evaluate the CNN and KNN machine learning techniques for the ReDraw paper, published in IEEE Transactions on Software Engineering in 2018.

    Link to ReDraw Paper: https://arxiv.org/abs/1802.02312

  18. The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial...

    • zenodo.org
    • autovi.utc.fr
    • +1more
    bin, txt, zip
    Updated Jun 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. http://doi.org/10.5281/zenodo.10459003
    Explore at:
    zip, txt, binAvailable download formats
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Philippe Carvalho; Philippe Carvalho; Meriem Lafou; Alexandre Durupt; Alexandre Durupt; Antoine Leblanc; Yves Grandvalet; Yves Grandvalet; Meriem Lafou; Antoine Leblanc
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    See the official website: https://autovi.utc.fr

    Modern industrial production lines must be set up with robust defect inspection modules that are able to withstand high product variability. This means that in a context of industrial production, new defects that are not yet known may appear, and must therefore be identified.

    On industrial production lines, the typology of potential defects is vast (texture, part failure, logical defects, etc.). Inspection systems must therefore be able to detect non-listed defects, i.e. not-yet-observed defects upon the development of the inspection system. To solve this problem, research and development of unsupervised AI algorithms on real-world data is required.

    Renault Group and the Université de technologie de Compiègne (Roberval and Heudiasyc Laboratories) have jointly developed the Automotive Visual Inspection Dataset (AutoVI), the purpose of which is to be used as a scientific benchmark to compare and develop advanced unsupervised anomaly detection algorithms under real production conditions. The images were acquired on Renault Group's automotive production lines, in a genuine industrial production line environment, with variations in brightness and lighting on constantly moving components. This dataset is representative of actual data acquisition conditions on automotive production lines.

    The dataset contains 3950 images, split into 1530 training images and 2420 testing images.

    The evaluation code can be found at https://github.com/phcarval/autovi_evaluation_code.

    Disclaimer
    All defects shown were intentionally created on Renault Group's production lines for the purpose of producing this dataset. The images were examined and labeled by Renault Group experts, and all defects were corrected after shooting.

    License
    Copyright © 2023-2024 Renault Group

    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/.

    For using the data in a way that falls under the commercial use clause of the license, please contact us.

    Attribution
    Please use the following for citing the dataset in scientific work:

    Carvalho, P., Lafou, M., Durupt, A., Leblanc, A., & Grandvalet, Y. (2024). The Automotive Visual Inspection Dataset (AutoVI): A Genuine Industrial Production Dataset for Unsupervised Anomaly Detection [Dataset]. https://doi.org/10.5281/zenodo.10459003

    Contact
    If you have any questions or remarks about this dataset, please contact us at philippe.carvalho@utc.fr, meriem.lafou@renault.com, alexandre.durupt@utc.fr, antoine.leblanc@renault.com, yves.grandvalet@utc.fr.

    Changelog

    • v1.0.0
      • Cropped engine_wiring, pipe_clip and pipe_staple images
      • Reduced tank_screw, underbody_pipes and underbody_screw image sizes
    • v0.1.1
      • Added ground truth segmentation maps
      • Fixed categorization of some images
      • Added new defect categories
      • Removed tube_fastening and kitting_cart
      • Removed duplicates in pipe_clip
  19. Stairs Image Dataset | Parts of House | Indoor

    • kaggle.com
    Updated Sep 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataCluster Labs (2022). Stairs Image Dataset | Parts of House | Indoor [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/stairs-image-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DataCluster Labs
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset is collected by DataCluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

    This dataset is an extremely challenging set of over 3000+ originally Stair images captured and crowdsourced from over 500+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.

    Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.

    Dataset Features

    • Dataset size : 3000+
    • Captured by : Over 500+ crowdsource contributors
    • Resolution : 100% images HD and above (1920x1080 and above)
    • Location : Captured with 500+ cities accross India
    • Diversity : Various lighting conditions like day, night, varied distances, view points etc.
    • Device used : Captured using mobile phones in 2020-2022
    • Usage : Stair detection , Stair Edge detection , Computer Vision , etc.

    Available Annotation formats

    COCO, YOLO, PASCAL-VOC, Tf-Record

    The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.

  20. n

    Malaria disease and grading system dataset from public hospitals reflecting...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie (2023). Malaria disease and grading system dataset from public hospitals reflecting complicated and uncomplicated conditions [Dataset]. http://doi.org/10.5061/dryad.4xgxd25gn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2023
    Dataset provided by
    Nasarawa State University
    Authors
    Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Malaria is the leading cause of death in the African region. Data mining can help extract valuable knowledge from available data in the healthcare sector. This makes it possible to train models to predict patient health faster than in clinical trials. Implementations of various machine learning algorithms such as K-Nearest Neighbors, Bayes Theorem, Logistic Regression, Support Vector Machines, and Multinomial Naïve Bayes (MNB), etc., has been applied to malaria datasets in public hospitals, but there are still limitations in modeling using the Naive Bayes multinomial algorithm. This study applies the MNB model to explore the relationship between 15 relevant attributes of public hospitals data. The goal is to examine how the dependency between attributes affects the performance of the classifier. MNB creates transparent and reliable graphical representation between attributes with the ability to predict new situations. The model (MNB) has 97% accuracy. It is concluded that this model outperforms the GNB classifier which has 100% accuracy and the RF which also has 100% accuracy. Methods Prior to collection of data, the researcher was be guided by all ethical training certification on data collection, right to confidentiality and privacy reserved called Institutional Review Board (IRB). Data was be collected from the manual archive of the Hospitals purposively selected using stratified sampling technique, transform the data to electronic form and store in MYSQL database called malaria. Each patient file was extracted and review for signs and symptoms of malaria then check for laboratory confirmation result from diagnosis. The data was be divided into two tables: the first table was called data1 which contain data for use in phase 1 of the classification, while the second table data2 which contains data for use in phase 2 of the classification. Data Source Collection Malaria incidence data set is obtained from Public hospitals from 2017 to 2021. These are the data used for modeling and analysis. Also, putting in mind the geographical location and socio-economic factors inclusive which are available for patients inhabiting those areas. Naive Bayes (Multinomial) is the model used to analyze the collected data for malaria disease prediction and grading accordingly. Data Preprocessing: Data preprocessing shall be done to remove noise and outlier. Transformation: The data shall be transformed from analog to electronic record. Data Partitioning The data which shall be collected will be divided into two portions; one portion of the data shall be extracted as a training set, while the other portion will be used for testing. The training portion shall be taken from a table stored in a database and will be called data which is training set1, while the training portion taking from another table store in a database is shall be called data which is training set2. The dataset was split into two parts: a sample containing 70% of the training data and 30% for the purpose of this research. Then, using MNB classification algorithms implemented in Python, the models were trained on the training sample. On the 30% remaining data, the resulting models were tested, and the results were compared with the other Machine Learning models using the standard metrics. Classification and prediction: Base on the nature of variable in the dataset, this study will use Naïve Bayes (Multinomial) classification techniques; Classification phase 1 and Classification phase 2. The operation of the framework is illustrated as follows: i. Data collection and preprocessing shall be done. ii. Preprocess data shall be stored in a training set 1 and training set 2. These datasets shall be used during classification. iii. Test data set is shall be stored in database test data set. iv. Part of the test data set must be compared for classification using classifier 1 and the remaining part must be classified with classifier 2 as follows: Classifier phase 1: It classify into positive or negative classes. If the patient is having malaria, then the patient is classified as positive (P), while a patient is classified as negative (N) if the patient does not have malaria.
    Classifier phase 2: It classify only data set that has been classified as positive by classifier 1, and then further classify them into complicated and uncomplicated class label. The classifier will also capture data on environmental factors, genetics, gender and age, cultural and socio-economic variables. The system will be designed such that the core parameters as a determining factor should supply their value.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mübarek Mazhar Çakır; Mübarek Mazhar Çakır (2023). Mechanical Parts Dataset 2022 [Dataset]. http://doi.org/10.5281/zenodo.7504801
Organization logo

Mechanical Parts Dataset 2022

Explore at:
Dataset updated
Jan 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mübarek Mazhar Çakır; Mübarek Mazhar Çakır
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Mechanical Parts Dataset

The dataset consists of a total of 2250 images obtained by downloading from various internet platforms. Among the images in the dataset, there are 714 images with bearings, 632 images with bolts, 616 images with gears and 586 images with nuts. A total of 10597 manual labeling processes were carried out in the dataset, including 2099 labels belonging to the bearing class, 2734 labels belonging to the bolt class, 2662 labels belonging to the gear class and 3102 labels belonging to the nut class.

Folder Content

The created dataset is divided into 3 as 80% train, 10% validation and 10% test. In the "Mechanical Parts Dataset" folder, there are three separate folders as "train", "test" and "val". In each of these three folders there are folders named "images" and "labels". Images are kept in the "images" folder and tag information is kept in the "labels" folder.

Finally, inside the folder there is a yaml file named "mech_parts_data" for the Yolo algorithm. This file contains the number of classes and class names.

Images and Labels

The dataset was prepared in accordance with the Yolov5 algorithm.
For example, the tag information of the image named "2a0xhkr_jpg.rf.45a11bf63c40ad6e47da384fdf6bb7a1.jpg" is stored in the txt file with the same name. The tag information (coordinates) in the txt file are as follows: "class x_center y_center width height".

Update 05.01.2023

***Pascal voc and coco json formats have been added.***

Related paper: doi.org/10.5281/zenodo.7496767

Search
Clear search
Close search
Google apps
Main menu