The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).
Overview
This dataset consists of images of various hand and power tools, specifically designed to aid in training and improving AI-based object recognition systems.
Tool recognition is a crucial component of modern machine learning applications, enabling automated identification in industrial settings, augmented reality, and inventory management. Training machine learning models on diverse tool images can help improve accuracy, especially in real-world environments where lighting conditions, angles, and backgrounds vary significantly.
This dataset contains images of multiple distinct tool types, with each tool category featuring hundreds of labeled samples captured from different perspectives.
Content: 1. Screwdriver AI Dataset - 853 Images 2. Pliers AI Dataset - 918 Images 3. Wrench AI Dataset - 803 Images 4. Hammer AI Dataset - 799 Images 5. Hand Saw AI Dataset - 781 Images 6. Electric Drill AI Dataset - 840 Images 7. Paint Roller AI Dataset - 533 Images 8. Nail AI Dataset - 675 Images 9. Drill Bit AI Dataset - 865 Images 10. Glue Gun AI Dataset - 692 Images 11. Voltage Tester AI Dataset - 890 Images 12. Axe AI Dataset - 643 Images 13. Tape Measure Dataset - 733 Images 14. Nail Gun Dataset - 691 Images 15. Putty Knife Dataset - 888 Images 16. Heat Gun Dataset - 728 Images 17. Level Tool Dataset - 758 Images 18. Paint Brush Dataset - 787 Images 19. Angle Grinder Dataset - 786 Images 20. Utility Knife Dataset - 894 Images 21. Soldering Iron Dataset - 904 Images 22. Extension Cord Dataset - 900 Images 23. Staple Gun Dataset - 370 Images 24. Digital Caliper Dataset - 617 Images 25. Clamp Dataset - 609 Images 26. Concrete Mixer Dataset - 487 Images 27. Ladder Dataset - 754 Images 28. Caulking Gun Dataset - 733 Images 29. Crowbar Dataset - 522 Images 30. Float Tool Dataset - 309 Images 31. Hoe Tool Dataset - 416 Images 32. Generator Dataset - 284 Images 33. Safety Helmet Dataset - 696 Images 34. Safety Glasses Dataset - 520 Images 35. Paint Spray Gun Dataset - 196 Images 36. Construction Earmuffs Dataset - 328 Images 37. Rubber Boots Dataset - 663 Images 38. Tool belt Dataset - 272 Images 39. Respirator Mask N95 Dataset - 425 Images 40. Paint Respirator Dataset - 373 Images 41. Infrared Digital Thermometer Dataset - 363 Images 42. Corner Trowel Dataset - 309 Images 43. Fire Extinguisher Dataset - 972 Images 44. Plastic Bucket Dataset - 945 Images 45. Paint Tray Dataset - 495 Images 46. Masking Tape Roll Dataset - 449 Images 47. Safety Vest Dataset - 658 Images 48. Digital Torque Wrench Dataset - 270 Images 49. Pry Bar Dataset - 725 Images 50. Chisel Dataset - 963 Images 51. Flashlight Dataset - 956 Images 52. Stainless Steel Wire Brush Dataset - 912 Images 53. Traffic Cone Dataset - 992 Images 54. Jigsaw Dataset - 613 Images 55. Ear Plugs Dataset - 986 Images 56. Wheel Barrow Dataset - 930 Images 57. Rubber Gloves Dataset - 989 Images 58. Trowel Dataset - 952 Images 59. Measuring Wheel Dataset - 978 Images 60. File Tool Dataset - 986 Images 61. Tape Measure Dataset - 988 Images 62. Jack Hammer Dataset - 549 Images 63. LED Light Bulb Dataset - 986 Images 64. Air Compressors Dataset - 981 Images 65. Machete Dataset - 797 Images 66. Kneepads Dataset - 987 Images 67. First AID Kit Dataset - 983 Images 68. Heavy Duty Vacuum Cleaner Dataset - 962 Images 69. Tubing Cutter Dataset - 793 Images 70. Hacksaw Dataset - 968 Images 71. Utility Torch Dataset - 627 Images 72. Welding Gloves Dataset - 993 Images 73. Moisture Meter Dataset - 719 Images 74. Palm Sander Dataset - 802 Images 75. Jack Plane Dataset - 706 Images 76. Mallet Dataset - 970 Images 77. Plunger Dataset - 956 Images 78. Head Flashlight Dataset - 979 Images 79. Stud Crimper Dataset - 746 Images 80. Extension Spring Dataset - 964 Images 81. Bearing Dataset - 969 Images 82. Metal Nut Dataset - 990 Images 83. Tin Snips Dataset - 988 Images 84. Power Socket Dataset - 937 Images 85. Wirecutter Dataset - 878 Images 86. Stud Finder Dataset - 967 Images 87. Socket Wrench Dataset - 996 Images 88. Stainless Steel Washer Dataset - 989 Images 89. Ball Valve Dataset - 973 Images 90. Pipe Sealing Tape Dataset - 993 Images 91. Layout Square Dataset - 958 Images 92. Tweezers Dataset - 805 Images 93. Brick Dataset - 853 Images 94. Rake Dataset - 978 Images 95. Hose Clamp Dataset - 882 Images 96. Metal Detector Dataset - 1293 Images 97. Hex Key Wrench Dataset - 1966 Images 98. Grommet Pliers Dataset - 1222 Images 99. Hose Dataset - 1507 Images 100. Ventilation Ducts - 703 Images 101. Zip Tie Dataset - 1914 Images 102. Step Drill Bit Dataset - 1884 Images 103. Flat Drill Bit Dataset - 1885 Images 104. Ratchet Dataset - 2236 Images 105. Nut Driver Dataset - 1889 Images 106. Post Hole Digger Dataset - 131 Images 107. Spade Shovel Dataset - 711 Images 108. Tenon Saw Dataset - 1603 Images 109. Awl tool Dataset - 1465 Images 110. Ruler tool Dataset - 701 Images 111. Circular File Tool Dataset - 1934 Images 112. Anvil Tool Dataset - 528 Images 113. Car Jack Dataset - 678 Ima...
Introducing a comprehensive and openly accessible dataset designed for researchers and data scientists in the field of artificial intelligence. This dataset encompasses a collection of over 4,000 AI tools, meticulously categorized into more than 50 distinct categories. This valuable resource has been generously shared by its owner, TasticAI, and is freely available for various purposes such as research, benchmarking, market surveys, and more. Dataset Overview: The dataset provides an extensive repository of AI tools, each accompanied by a wealth of information to facilitate your research endeavors. Here is a brief overview of the key components: AI Tool Name: Each AI tool is listed with its name, providing an easy reference point for users to identify specific tools within the dataset. Description: A concise one-line description is provided for each AI tool. This description offers a quick glimpse into the tool's purpose and functionality. AI Tool Category: The dataset is thoughtfully organized into more than 50 distinct categories, ensuring that you can easily locate AI tools that align with your research interests or project needs. Whether you are working on natural language processing, computer vision, machine learning, or other AI subfields, you will find a dedicated category. Images: Visual representation is crucial for understanding and identifying AI tools. To aid your exploration, the dataset includes images associated with each tool, allowing for quick recognition and visual association. Website Links: Accessing more detailed information about a specific AI tool is effortless, as direct links to the tool's respective website or documentation are provided. This feature enables researchers and data scientists to delve deeper into the tools that pique their interest. Utilization and Benefits: This openly shared dataset serves as a valuable resource for various purposes: Research: Researchers can use this dataset to identify AI tools relevant to their studies, facilitating faster literature reviews, comparative analyses, and the exploration of cutting-edge technologies. Benchmarking: The extensive collection of AI tools allows for comprehensive benchmarking, enabling you to evaluate and compare tools within specific categories or across categories. Market Surveys: Data scientists and market analysts can utilize this dataset to gain insights into the AI tool landscape, helping them identify emerging trends and opportunities within the AI market. Educational Purposes: Educators and students can leverage this dataset for teaching and learning about AI tools, their applications, and the categorization of AI technologies. Conclusion: In summary, this openly shared dataset from TasticAI, featuring over 4,000 AI tools categorized into more than 50 categories, represents a valuable asset for researchers, data scientists, and anyone interested in the field of artificial intelligence. Its easy accessibility, detailed information, and versatile applications make it an indispensable resource for advancing AI research, benchmarking, market analysis, and more. Explore the dataset at https://tasticai.com and unlock the potential of this rich collection of AI tools for your projects and studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.
For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.
Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.
Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.
By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A curated collection of ultra-high-resolution Ferrari car images, scraped from WSupercars.com and neatly organized by model. This dataset is ideal for machine learning, computer vision, and creative applications such as wallpaper generators, AR design tools, and synthetic data modeling. All images are native 3840×2160 resolution perfect for both research and visual content creation.
📌 Educational and research use only — All images are copyright of their respective owners.
Folder: ferrari_images/ Subfolders by car model (e.g., f80, 812, sf90) Each folder contains multiple ultra-HD wallpapers (3840×2160)
- Car Model Classification – Train AI to recognize different Ferrari models
- Vision Tasks – Use for super-resolution, enhancement, detection, and segmentation
- Generative Models – Ideal input for GANs, diffusion models, or neural style transfer
- Wallpaper & Web Apps – Populate high-quality visual content for websites or mobile platforms
- Fine-Tuning Vision Models – Compatible with CNNs, ViTs, and transformer architectures
- Self-Supervised Learning – Leverage unlabeled images for contrastive training methods
- Game/Simulation Prototyping – Use as visual references or placeholders in 3D environments
- AR & Design Tools – Integrate into automotive mockups, design UIs, or creative workflows
- This release includes only Ferrari vehicle images
- All images are native UHD (3840×2160), with no duplicates or downscaled versions
- Novitec-tuned models are included both in the
novitec/
folder and within their respective model folders(e.g., 296/, sf90/)
for convenience.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In this study, we introduce PFASorptionML, a novel machine learning (ML) tool developed to predict solid–liquid distribution coefficients (Kd) for per- and polyfluoroalkyl substances (PFAS) in soils. Leveraging a data set of 1,274 Kd entries for PFAS in soils and sediments, including compounds such as trifluoroacetate, cationic, and zwitterionic PFAS, and neutral fluorotelomer alcohols, the model incorporates PFAS-specific properties such as molecular weight, hydrophobicity, and pKa, alongside soil characteristics like pH, texture, organic carbon content, and cation exchange capacity. Sensitivity analysis reveals that molecular weight, hydrophobicity, and organic carbon content are the most significant factors influencing sorption behavior, while charge density and mineral soil fraction have comparatively minor effects. The model demonstrates high predictive performance, with RPD values exceeding 3.16 across validation data sets, outperforming existing tools in accuracy and scope. Notably, PFAS chain length and functional group variability significantly influence Kd, with longer chain lengths and higher hydrophobicity positively correlating with Kd. By integrating location-specific soil repository data, the model enables the generation of spatial Kd maps for selected PFAS species. These capabilities are implemented in the online platform PFASorptionML, providing researchers and practitioners with a valuable resource for conducting environmental risk assessments of PFAS contamination in soils.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for automated data annotation tools was valued at approximately USD 1.2 billion in 2023, and it is projected to reach around USD 6.8 billion by 2032, exhibiting a CAGR of 20.2% during the forecast period. This market is witnessing rapid growth primarily driven by the increasing demand for high-quality data sets to train various machine learning and artificial intelligence models.
One of the primary growth factors for this market is the escalating need for automation in data preparation tasks, which occupy a significant amount of time and resources. Automated data annotation tools streamline the labor-intensive process of labeling data, ensuring quicker and more accurate results. The rising adoption of artificial intelligence and machine learning across various industries such as healthcare, automotive, and finance is propelling the demand for these tools, as they play a critical role in enhancing the efficiency and efficacy of AI models.
Another significant factor contributing to the market's growth is the continuous advancements in technology, such as the integration of machine learning, natural language processing, and computer vision in data annotation tools. These technological enhancements enable more sophisticated and precise data labeling, which is essential for improving the performance of AI applications. Moreover, the growing availability of large data sets and the need for effective data management solutions are further driving the market forward.
The rise in partnerships and collaborations among key market players to develop innovative data annotation solutions is also a notable growth factor. Companies are increasingly investing in research and development activities to introduce advanced tools that cater to the diverse needs of different industry verticals. This collaborative approach not only helps in expanding the product portfolio but also enhances the overall market presence of the companies involved.
Regionally, North America holds a significant share of the automated data annotation tool market, driven by the early adoption of cutting-edge technologies and the presence of major tech giants in the region. However, the Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period, owing to the rapid industrialization, increasing investments in AI infrastructure, and the growing focus on digital transformation initiatives across various sectors.
The automated data annotation tool market, segmented by component into software and services, reveals distinct trends and preferences in the industry. The software segment is expected to dominate the market due to the increasing adoption of advanced data annotation software solutions that offer robust features, including automated labeling, quality control, and integration capabilities. These software solutions are crucial for organizations looking to enhance their AI and machine learning models' performance by providing accurate and consistent data annotations.
On the other hand, the services segment is also witnessing substantial growth, driven by the rising demand for professional services such as consulting, implementation, and maintenance. Organizations often require expert assistance to effectively deploy and manage data annotation tools, ensuring they derive maximum value from their investments. Service providers offer tailored solutions to meet the specific needs of different industries, thereby driving the growth of this segment.
The continuous innovation and development in software solutions are further propelling the growth of the software segment. Companies are focusing on enhancing the capabilities of their annotation tools by incorporating advanced technologies such as machine learning algorithms and natural language processing. These advancements enable more accurate and efficient data labeling processes, which are essential for training high-performing AI models.
In addition, the integration of data annotation tools with other enterprise systems, such as data management platforms and analytics solutions, is further driving the adoption of software solutions. This integration allows organizations to streamline their data workflows and improve overall productivity. The growing need for scalable and flexible data annotation solutions is also contributing to the dominance of the software segment in the market.
Overall, both software and ser
With the advent of new spectroscopic surveys from ground and space, observing up to hundreds of millions of galaxies, spectra classification will become overwhelming for standard analysis techniques. To prepare for this challenge, we introduce a family of deep learning tools to classify features in one-dimensional spectra. As the first application of these Galaxy Spectra neural Networks (GaSNets), we focus on tools specialized in identifying emission lines from strongly lensed star-forming galaxies in the eBOSS spectra. We first discuss the training and testing of these networks and define a threshold probability, PL, of 95% for the high-quality event detection. Then, using a previous set of spectroscopically selected strong lenses from eBOSS, confirmed with the Hubble Space Telescope (HST), we estimate a completeness of ~80% as the fraction of lenses recovered above the adopted PL. We finally apply the GaSNets to ~1.3M eBOSS spectra to collect the first list of ~430 new high-quality candidates identified with deep learning from spectroscopy and visually graded as highly probable real events. A preliminary check against ground-based observations tentatively shows that this sample has a confirmation rate of 38%, in line with previous samples selected with standard (no deep learning) classification tools and confirmed by the HST. This first test shows that machine learning can be efficiently extended to feature recognition in the wavelength space, which will be crucial for future surveys like 4MOST, DESI, Euclid, and the China Space Station Telescope. Cone search capability for table J/other/RAA/22.F5014/appena (New high quality (HQ) candidates from GaSNets)
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameters that gave the best results on Dataset1 and Dataset2 are denoted by (1) and (2) respectively. The models chosen on Dataset1 were also used for Dataset1-nd.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The ECG Images Dataset of Cardiac Patients is an extensive collection of electrocardiogram (ECG) images designed to aid research and advancements in the field of cardiovascular medicine. This dataset provides a wealth of data that can be utilized for various analyses, including the development of diagnostic tools and the study of different cardiac conditions.
The dataset is organized into four main categories, each representing different cardiac conditions:
Certainly! Here’s a revised and enhanced description for your Kaggle dataset post, with the requested information removed:
ECG Images Dataset of Cardiac Patients Description The ECG Images Dataset of Cardiac Patients is an extensive collection of electrocardiogram (ECG) images designed to aid research and advancements in the field of cardiovascular medicine. This dataset provides a wealth of data that can be utilized for various analyses, including the development of diagnostic tools and the study of different cardiac conditions.
Dataset Overview The dataset is organized into four main categories, each representing different cardiac conditions:
ECG Images of Myocardial Infarction Patients
Number of Images: 240 Total Dimensions: 240x12 (total of 2880 images) Description: These images are from patients diagnosed with myocardial infarction (MI), commonly known as a heart attack. The images reflect the ECG patterns typically associated with this critical condition. ECG Images of Patients with Abnormal Heartbeat
Number of Images: 233 Total Dimensions: 233x12 (total of 2796 images) Description: This category includes ECG images from patients exhibiting abnormal heartbeat patterns. Such patterns may indicate a range of arrhythmias or other cardiac issues, providing crucial data for diagnostic and research purposes. ECG Images of Patients with a History of Myocardial Infarction
Number of Images: 172 Total Dimensions: 172x12 (total of 2064 images) Description: These images come from patients who have a documented history of myocardial infarction. They offer insights into the long-term effects and recovery patterns associated with heart attacks. Normal Person ECG Images
Number of Images: 284 Total Dimensions: 284x12 (total of 3408 images) Description: This category features ECG images from individuals with no known cardiac issues, serving as a baseline for comparison with pathological cases.
The dataset is available for download from Kaggle and is provided in a compressed file of approximately 194 MB.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General statistics on the four linearly-interpolated datasets used in this study. (m ± s) is for respectively mean and standard deviation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).