13 datasets found
  1. Data from: Development of the InTelligence And Machine LEarning (TAME)...

    • catalog.data.gov
    Updated Oct 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research [Dataset]. https://catalog.data.gov/dataset/development-of-the-intelligence-and-machine-learning-tame-toolkit-for-introductory-data-sc
    Explore at:
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).

  2. d

    OORT - Object Detection Data | Diverse Home Repair Tools Dataset for Machine...

    • datarade.ai
    .jpeg, .jpg, .png
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OORT (2025). OORT - Object Detection Data | Diverse Home Repair Tools Dataset for Machine Learning | 125 Categories, 100K+ Data Points [Dataset]. https://datarade.ai/data-products/diverse-tools-image-dataset-for-machine-learning-oort
    Explore at:
    .jpeg, .jpg, .pngAvailable download formats
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    OORT
    Area covered
    Ethiopia, Iran (Islamic Republic of), France, Tokelau, Lao People's Democratic Republic, Guernsey, Ukraine, Taiwan, Nauru, French Guiana
    Description

    Overview

    This dataset consists of images of various hand and power tools, specifically designed to aid in training and improving AI-based object recognition systems.

    Tool recognition is a crucial component of modern machine learning applications, enabling automated identification in industrial settings, augmented reality, and inventory management. Training machine learning models on diverse tool images can help improve accuracy, especially in real-world environments where lighting conditions, angles, and backgrounds vary significantly.

    This dataset contains images of multiple distinct tool types, with each tool category featuring hundreds of labeled samples captured from different perspectives.

    Content: 1. Screwdriver AI Dataset - 853 Images 2. Pliers AI Dataset - 918 Images 3. Wrench AI Dataset - 803 Images 4. Hammer AI Dataset - 799 Images 5. Hand Saw AI Dataset - 781 Images 6. Electric Drill AI Dataset - 840 Images 7. Paint Roller AI Dataset - 533 Images 8. Nail AI Dataset - 675 Images 9. Drill Bit AI Dataset - 865 Images 10. Glue Gun AI Dataset - 692 Images 11. Voltage Tester AI Dataset - 890 Images 12. Axe AI Dataset - 643 Images 13. Tape Measure Dataset - 733 Images 14. Nail Gun Dataset - 691 Images 15. Putty Knife Dataset - 888 Images 16. Heat Gun Dataset - 728 Images 17. Level Tool Dataset - 758 Images 18. Paint Brush Dataset - 787 Images 19. Angle Grinder Dataset - 786 Images 20. Utility Knife Dataset - 894 Images 21. Soldering Iron Dataset - 904 Images 22. Extension Cord Dataset - 900 Images 23. Staple Gun Dataset - 370 Images 24. Digital Caliper Dataset - 617 Images 25. Clamp Dataset - 609 Images 26. Concrete Mixer Dataset - 487 Images 27. Ladder Dataset - 754 Images 28. Caulking Gun Dataset - 733 Images 29. Crowbar Dataset - 522 Images 30. Float Tool Dataset - 309 Images 31. Hoe Tool Dataset - 416 Images 32. Generator Dataset - 284 Images 33. Safety Helmet Dataset - 696 Images 34. Safety Glasses Dataset - 520 Images 35. Paint Spray Gun Dataset - 196 Images 36. Construction Earmuffs Dataset - 328 Images 37. Rubber Boots Dataset - 663 Images 38. Tool belt Dataset - 272 Images 39. Respirator Mask N95 Dataset - 425 Images 40. Paint Respirator Dataset - 373 Images 41. Infrared Digital Thermometer Dataset - 363 Images 42. Corner Trowel Dataset - 309 Images 43. Fire Extinguisher Dataset - 972 Images 44. Plastic Bucket Dataset - 945 Images 45. Paint Tray Dataset - 495 Images 46. Masking Tape Roll Dataset - 449 Images 47. Safety Vest Dataset - 658 Images 48. Digital Torque Wrench Dataset - 270 Images 49. Pry Bar Dataset - 725 Images 50. Chisel Dataset - 963 Images 51. Flashlight Dataset - 956 Images 52. Stainless Steel Wire Brush Dataset - 912 Images 53. Traffic Cone Dataset - 992 Images 54. Jigsaw Dataset - 613 Images 55. Ear Plugs Dataset - 986 Images 56. Wheel Barrow Dataset - 930 Images 57. Rubber Gloves Dataset - 989 Images 58. Trowel Dataset - 952 Images 59. Measuring Wheel Dataset - 978 Images 60. File Tool Dataset - 986 Images 61. Tape Measure Dataset - 988 Images 62. Jack Hammer Dataset - 549 Images 63. LED Light Bulb Dataset - 986 Images 64. Air Compressors Dataset - 981 Images 65. Machete Dataset - 797 Images 66. Kneepads Dataset - 987 Images 67. First AID Kit Dataset - 983 Images 68. Heavy Duty Vacuum Cleaner Dataset - 962 Images 69. Tubing Cutter Dataset - 793 Images 70. Hacksaw Dataset - 968 Images 71. Utility Torch Dataset - 627 Images 72. Welding Gloves Dataset - 993 Images 73. Moisture Meter Dataset - 719 Images 74. Palm Sander Dataset - 802 Images 75. Jack Plane Dataset - 706 Images 76. Mallet Dataset - 970 Images 77. Plunger Dataset - 956 Images 78. Head Flashlight Dataset - 979 Images 79. Stud Crimper Dataset - 746 Images 80. Extension Spring Dataset - 964 Images 81. Bearing Dataset - 969 Images 82. Metal Nut Dataset - 990 Images 83. Tin Snips Dataset - 988 Images 84. Power Socket Dataset - 937 Images 85. Wirecutter Dataset - 878 Images 86. Stud Finder Dataset - 967 Images 87. Socket Wrench Dataset - 996 Images 88. Stainless Steel Washer Dataset - 989 Images 89. Ball Valve Dataset - 973 Images 90. Pipe Sealing Tape Dataset - 993 Images 91. Layout Square Dataset - 958 Images 92. Tweezers Dataset - 805 Images 93. Brick Dataset - 853 Images 94. Rake Dataset - 978 Images 95. Hose Clamp Dataset - 882 Images 96. Metal Detector Dataset - 1293 Images 97. Hex Key Wrench Dataset - 1966 Images 98. Grommet Pliers Dataset - 1222 Images 99. Hose Dataset - 1507 Images 100. Ventilation Ducts - 703 Images 101. Zip Tie Dataset - 1914 Images 102. Step Drill Bit Dataset - 1884 Images 103. Flat Drill Bit Dataset - 1885 Images 104. Ratchet Dataset - 2236 Images 105. Nut Driver Dataset - 1889 Images 106. Post Hole Digger Dataset - 131 Images 107. Spade Shovel Dataset - 711 Images 108. Tenon Saw Dataset - 1603 Images 109. Awl tool Dataset - 1465 Images 110. Ruler tool Dataset - 701 Images 111. Circular File Tool Dataset - 1934 Images 112. Anvil Tool Dataset - 528 Images 113. Car Jack Dataset - 678 Ima...

  3. d

    AI TOOLS - Open Dataset - 4000 tools / 50 categories

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BUREAU, Olivier (2023). AI TOOLS - Open Dataset - 4000 tools / 50 categories [Dataset]. http://doi.org/10.7910/DVN/QLSXZG
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    BUREAU, Olivier
    Description

    Introducing a comprehensive and openly accessible dataset designed for researchers and data scientists in the field of artificial intelligence. This dataset encompasses a collection of over 4,000 AI tools, meticulously categorized into more than 50 distinct categories. This valuable resource has been generously shared by its owner, TasticAI, and is freely available for various purposes such as research, benchmarking, market surveys, and more. Dataset Overview: The dataset provides an extensive repository of AI tools, each accompanied by a wealth of information to facilitate your research endeavors. Here is a brief overview of the key components: AI Tool Name: Each AI tool is listed with its name, providing an easy reference point for users to identify specific tools within the dataset. Description: A concise one-line description is provided for each AI tool. This description offers a quick glimpse into the tool's purpose and functionality. AI Tool Category: The dataset is thoughtfully organized into more than 50 distinct categories, ensuring that you can easily locate AI tools that align with your research interests or project needs. Whether you are working on natural language processing, computer vision, machine learning, or other AI subfields, you will find a dedicated category. Images: Visual representation is crucial for understanding and identifying AI tools. To aid your exploration, the dataset includes images associated with each tool, allowing for quick recognition and visual association. Website Links: Accessing more detailed information about a specific AI tool is effortless, as direct links to the tool's respective website or documentation are provided. This feature enables researchers and data scientists to delve deeper into the tools that pique their interest. Utilization and Benefits: This openly shared dataset serves as a valuable resource for various purposes: Research: Researchers can use this dataset to identify AI tools relevant to their studies, facilitating faster literature reviews, comparative analyses, and the exploration of cutting-edge technologies. Benchmarking: The extensive collection of AI tools allows for comprehensive benchmarking, enabling you to evaluate and compare tools within specific categories or across categories. Market Surveys: Data scientists and market analysts can utilize this dataset to gain insights into the AI tool landscape, helping them identify emerging trends and opportunities within the AI market. Educational Purposes: Educators and students can leverage this dataset for teaching and learning about AI tools, their applications, and the categorization of AI technologies. Conclusion: In summary, this openly shared dataset from TasticAI, featuring over 4,000 AI tools categorized into more than 50 categories, represents a valuable asset for researchers, data scientists, and anyone interested in the field of artificial intelligence. Its easy accessibility, detailed information, and versatile applications make it an indispensable resource for advancing AI research, benchmarking, market analysis, and more. Explore the dataset at https://tasticai.com and unlock the potential of this rich collection of AI tools for your projects and studies.

  4. R

    Car Highway Dataset

    • universe.roboflow.com
    zip
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sallar (2023). Car Highway Dataset [Dataset]. https://universe.roboflow.com/sallar/car-highway/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 13, 2023
    Dataset authored and provided by
    Sallar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Car-Highway Data Annotation Project

    Introduction

    In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.

    Project Goals

    • Collect a diverse dataset of car images from highway scenes.
    • Annotate the dataset to identify and label cars within each image.
    • Organize and format the annotated data for machine learning model training.

    Tools and Technologies

    For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.

    Annotation Process

    1. Upload the raw car images to the Roboflow platform.
    2. Use the annotation tools in Roboflow to draw bounding boxes around each car in the images.
    3. Label each bounding box with the corresponding class (e.g., car).
    4. Review and validate the annotations for accuracy.

    Data Augmentation

    Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.

    Data Export

    Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.

    Milestones

    1. Data Collection and Preprocessing
    2. Annotation of Car Images
    3. Data Augmentation
    4. Data Export
    5. Model Training

    Conclusion

    By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.

  5. Ferrari Images Dataset (2025)

    • kaggle.com
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urvish Ahir (2025). Ferrari Images Dataset (2025) [Dataset]. https://www.kaggle.com/datasets/urvishahir/ferrari-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Urvish Ahir
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🏎️ Ferrari Image Dataset (3840×2160 UHD)

    A curated collection of ultra-high-resolution Ferrari car images, scraped from WSupercars.com and neatly organized by model. This dataset is ideal for machine learning, computer vision, and creative applications such as wallpaper generators, AR design tools, and synthetic data modeling. All images are native 3840×2160 resolution perfect for both research and visual content creation.

    📌 Educational and research use only — All images are copyright of their respective owners.

    📁 Dataset Overview :

    Folder: ferrari_images/ Subfolders by car model (e.g., f80, 812, sf90) Each folder contains multiple ultra-HD wallpapers (3840×2160)

    Use Cases :

    • Car Model Classification – Train AI to recognize different Ferrari models
    • Vision Tasks – Use for super-resolution, enhancement, detection, and segmentation
    • Generative Models – Ideal input for GANs, diffusion models, or neural style transfer
    • Wallpaper & Web Apps – Populate high-quality visual content for websites or mobile platforms
    • Fine-Tuning Vision Models – Compatible with CNNs, ViTs, and transformer architectures
    • Self-Supervised Learning – Leverage unlabeled images for contrastive training methods
    • Game/Simulation Prototyping – Use as visual references or placeholders in 3D environments
    • AR & Design Tools – Integrate into automotive mockups, design UIs, or creative workflows

    Notes :

    • This release includes only Ferrari vehicle images
    • All images are native UHD (3840×2160), with no duplicates or downscaled versions
    • Novitec-tuned models are included both in thenovitec/ folder and within their respective model folders (e.g., 296/, sf90/) for convenience.
  6. f

    Modeling PFAS Sorption in Soils Using Machine Learning

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Fabregat-Palau; Amirhossein Ershadi; Michael Finkel; Anna Rigol; Miquel Vidal; Peter Grathwohl (2025). Modeling PFAS Sorption in Soils Using Machine Learning [Dataset]. http://doi.org/10.1021/acs.est.4c13284.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    ACS Publications
    Authors
    Joel Fabregat-Palau; Amirhossein Ershadi; Michael Finkel; Anna Rigol; Miquel Vidal; Peter Grathwohl
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In this study, we introduce PFASorptionML, a novel machine learning (ML) tool developed to predict solid–liquid distribution coefficients (Kd) for per- and polyfluoroalkyl substances (PFAS) in soils. Leveraging a data set of 1,274 Kd entries for PFAS in soils and sediments, including compounds such as trifluoroacetate, cationic, and zwitterionic PFAS, and neutral fluorotelomer alcohols, the model incorporates PFAS-specific properties such as molecular weight, hydrophobicity, and pKa, alongside soil characteristics like pH, texture, organic carbon content, and cation exchange capacity. Sensitivity analysis reveals that molecular weight, hydrophobicity, and organic carbon content are the most significant factors influencing sorption behavior, while charge density and mineral soil fraction have comparatively minor effects. The model demonstrates high predictive performance, with RPD values exceeding 3.16 across validation data sets, outperforming existing tools in accuracy and scope. Notably, PFAS chain length and functional group variability significantly influence Kd, with longer chain lengths and higher hydrophobicity positively correlating with Kd. By integrating location-specific soil repository data, the model enables the generation of spatial Kd maps for selected PFAS species. These capabilities are implemented in the online platform PFASorptionML, providing researchers and practitioners with a valuable resource for conducting environmental risk assessments of PFAS contamination in soils.

  7. Automated Data Annotation Tool Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Automated Data Annotation Tool Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/automated-data-annotation-tool-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Authors
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Automated Data Annotation Tool Market Outlook



    The global market size for automated data annotation tools was valued at approximately USD 1.2 billion in 2023, and it is projected to reach around USD 6.8 billion by 2032, exhibiting a CAGR of 20.2% during the forecast period. This market is witnessing rapid growth primarily driven by the increasing demand for high-quality data sets to train various machine learning and artificial intelligence models.



    One of the primary growth factors for this market is the escalating need for automation in data preparation tasks, which occupy a significant amount of time and resources. Automated data annotation tools streamline the labor-intensive process of labeling data, ensuring quicker and more accurate results. The rising adoption of artificial intelligence and machine learning across various industries such as healthcare, automotive, and finance is propelling the demand for these tools, as they play a critical role in enhancing the efficiency and efficacy of AI models.



    Another significant factor contributing to the market's growth is the continuous advancements in technology, such as the integration of machine learning, natural language processing, and computer vision in data annotation tools. These technological enhancements enable more sophisticated and precise data labeling, which is essential for improving the performance of AI applications. Moreover, the growing availability of large data sets and the need for effective data management solutions are further driving the market forward.



    The rise in partnerships and collaborations among key market players to develop innovative data annotation solutions is also a notable growth factor. Companies are increasingly investing in research and development activities to introduce advanced tools that cater to the diverse needs of different industry verticals. This collaborative approach not only helps in expanding the product portfolio but also enhances the overall market presence of the companies involved.



    Regionally, North America holds a significant share of the automated data annotation tool market, driven by the early adoption of cutting-edge technologies and the presence of major tech giants in the region. However, the Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period, owing to the rapid industrialization, increasing investments in AI infrastructure, and the growing focus on digital transformation initiatives across various sectors.



    Component Analysis



    The automated data annotation tool market, segmented by component into software and services, reveals distinct trends and preferences in the industry. The software segment is expected to dominate the market due to the increasing adoption of advanced data annotation software solutions that offer robust features, including automated labeling, quality control, and integration capabilities. These software solutions are crucial for organizations looking to enhance their AI and machine learning models' performance by providing accurate and consistent data annotations.



    On the other hand, the services segment is also witnessing substantial growth, driven by the rising demand for professional services such as consulting, implementation, and maintenance. Organizations often require expert assistance to effectively deploy and manage data annotation tools, ensuring they derive maximum value from their investments. Service providers offer tailored solutions to meet the specific needs of different industries, thereby driving the growth of this segment.



    The continuous innovation and development in software solutions are further propelling the growth of the software segment. Companies are focusing on enhancing the capabilities of their annotation tools by incorporating advanced technologies such as machine learning algorithms and natural language processing. These advancements enable more accurate and efficient data labeling processes, which are essential for training high-performing AI models.



    In addition, the integration of data annotation tools with other enterprise systems, such as data management platforms and analytics solutions, is further driving the adoption of software solutions. This integration allows organizations to streamline their data workflows and improve overall productivity. The growing need for scalable and flexible data annotation solutions is also contributing to the dominance of the software segment in the market.



    Overall, both software and ser

  8. e

    New lens candidates from GaSNets - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). New lens candidates from GaSNets - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a9349a93-2709-5d10-8d3b-6f6bedf7036b
    Explore at:
    Dataset updated
    Jun 7, 2022
    Description

    With the advent of new spectroscopic surveys from ground and space, observing up to hundreds of millions of galaxies, spectra classification will become overwhelming for standard analysis techniques. To prepare for this challenge, we introduce a family of deep learning tools to classify features in one-dimensional spectra. As the first application of these Galaxy Spectra neural Networks (GaSNets), we focus on tools specialized in identifying emission lines from strongly lensed star-forming galaxies in the eBOSS spectra. We first discuss the training and testing of these networks and define a threshold probability, PL, of 95% for the high-quality event detection. Then, using a previous set of spectroscopically selected strong lenses from eBOSS, confirmed with the Hubble Space Telescope (HST), we estimate a completeness of ~80% as the fraction of lenses recovered above the adopted PL. We finally apply the GaSNets to ~1.3M eBOSS spectra to collect the first list of ~430 new high-quality candidates identified with deep learning from spectroscopy and visually graded as highly probable real events. A preliminary check against ground-based observations tentatively shows that this sample has a confirmation rate of 38%, in line with previous samples selected with standard (no deep learning) classification tools and confirmed by the HST. This first test shows that machine learning can be efficiently extended to feature recognition in the wavelength space, which will be crucial for future surveys like 4MOST, DESI, Euclid, and the China Space Station Telescope. Cone search capability for table J/other/RAA/22.F5014/appena (New high quality (HQ) candidates from GaSNets)

  9. HTRU2

    • figshare.com
    zip
    Updated Apr 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Lyon (2016). HTRU2 [Dataset]. http://doi.org/10.6084/m9.figshare.3080389.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Robert Lyon
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description
    1. Overview HTRU2 is a data set which describes a sample of pulsar candidates collected during the High Time Resolution Universe Survey (South) [1]. Pulsars are a rare type of Neutron star that produce radio emission detectable here on Earth. They are of considerable scientific interest as probes of space-time, the inter-stellar medium, and states of matter (see [2] for more uses). As pulsars rotate, their emission beam sweeps across the sky, and when this crosses our line of sight, produces a detectable pattern of broadband radio emission. As pulsars rotate rapidly, this pattern repeats periodically. Thus pulsar search involves looking for periodic radio signals with large radio telescopes. Each pulsar produces a slightly different emission pattern, which varies slightly with each rotation (see [2] for an introduction to pulsar astrophysics to find out why). Thus a potential signal detection known as a 'candidate', is averaged over many rotations of the pulsar, as determined by the length of an observation. In the absence of additional info, each candidate could potentially describe a real pulsar. However in practice almost all detections are caused by radio frequency interference (RFI) and noise, making legitimate signals hard to find. Machine learning tools are now being used to automatically label pulsar candidates to facilitate rapid analysis. Classification systems in particular are being widely adopted, (see [4,5,6,7,8,9]) which treat the candidate data sets as binary classification problems. Here the legitimate pulsar examples are a minority positive class, and spurious examples the majority negative class. At present multi-class labels are unavailable, given the costs associated with data annotation. The data set shared here contains 16,259 spurious examples caused by RFI/noise, and 1,639 real pulsar examples. These examples have all been checked by human annotators. Each candidate is described by 8 continuous variables. The first four are simple statistics obtained from the integrated pulse profile (folded profile). This is an array of continuous variables that describe a longitude-resolved version of the signal that has been averaged in both time and frequency (see [3] for more details). The remaining four variables are similarly obtained from the DM-SNR curve (again see [3] for more details). These are summarised below: 1. Mean of the integrated profile. 2. Standard deviation of the integrated profile. 3. Excess kurtosis of the integrated profile. 4. Skewness of the integrated profile. 5. Mean of the DM-SNR curve. 6. Standard deviation of the DM-SNR curve. 7. Excess kurtosis of the DM-SNR curve. 8. Skewness of the DM-SNR curve. HTRU 2 Summary 17,898 total examples. 1,639 positive examples. 16,259 negative examples. The data is presented in two formats: CSV and ARFF (used by the WEKA data mining tool). Candidates are stored in both files in separate rows. Each row lists the variables first, and the class label is the final entry. The class labels used are 0 (negative) and 1 (positive). Please note that the data contains no positional information or other astronomical details. It is simply feature data extracted from candidate files using the PulsarFeatureLab tool (see [10]).2. Citing our work If you use the dataset in your work please cite us using the DOI of the dataset, and the paper: R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, J. D. Knowles, Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach MNRAS, 2016. 3. Acknowledgements This data was obtained with the support of grant EP/I028099/1 for the University of Manchester Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences Research Council (EPSRC). The raw observational data was collected by the High Time Resolution Universe Collaboration using the Parkes Observatory, funded by the Commonwealth of Australia and managed by the CSIRO. 4. References [1] M.~J. Keith et al., "The High Time Resolution Universe Pulsar Survey - I. System Configuration and Initial Discoveries",2010, Monthly Notices of the Royal Astronomical Society, vol. 409, pp. 619-627. DOI: 10.1111/j.1365-2966.2010.17325.x [2] D. R. Lorimer and M. Kramer, "Handbook of Pulsar Astronomy", Cambridge University Press, 2005. [3] R. J. Lyon, "Why Are Pulsars Hard To Find?", PhD Thesis, University of Manchester, 2015. [4] R. J. Lyon et al., "Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach", Monthly Notices of the Royal Astronomical Society, submitted. [5] R. P. Eatough et al., "Selection of radio pulsar candidates using artificial neural networks", Monthly Notices of the Royal Astronomical Society, vol. 407, no. 4, pp. 2443-2450, 2010. [6] S. D. Bates et al., "The high time resolution universe pulsar survey vi. an artificial neural network and timing of 75 pulsars", Monthly Notices of the Royal Astronomical Society, vol. 427, no. 2, pp. 1052-1065, 2012. [7] D. Thornton, "The High Time Resolution Radio Sky", PhD thesis, University of Manchester, Jodrell Bank Centre for Astrophysics School of Physics and Astronomy, 2013. [8] K. J. Lee et al., "PEACE: pulsar evaluation algorithm for candidate extraction a software package for post-analysis processing of pulsar survey candidates", Monthly Notices of the Royal Astronomical Society, vol. 433, no. 1, pp. 688-694, 2013. [9] V. Morello et al., "SPINN: a straightforward machine learning solution to the pulsar candidate selection problem", Monthly Notices of the Royal Astronomical Society, vol. 443, no. 2, pp. 1651-1662, 2014. [10] R. J. Lyon, "PulsarFeatureLab", 2015, https://dx.doi.org/10.6084/m9.figshare.1536472.v1.
  10. Overview of tested hyperparameters.

    • plos.figshare.com
    xls
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Constance Creux; Farida Zehraoui; François Radvanyi; Fariza Tahi (2024). Overview of tested hyperparameters. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012446.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Constance Creux; Farida Zehraoui; François Radvanyi; Fariza Tahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameters that gave the best results on Dataset1 and Dataset2 are denoted by (1) and (2) respectively. The models chosen on Dataset1 were also used for Dataset1-nd.

  11. ECG Images Dataset of Cardiac Patients

    • kaggle.com
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2024). ECG Images Dataset of Cardiac Patients [Dataset]. https://www.kaggle.com/datasets/evilspirit05/ecg-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    The ECG Images Dataset of Cardiac Patients is an extensive collection of electrocardiogram (ECG) images designed to aid research and advancements in the field of cardiovascular medicine. This dataset provides a wealth of data that can be utilized for various analyses, including the development of diagnostic tools and the study of different cardiac conditions.
    

    Dataset Overview

    The dataset is organized into four main categories, each representing different cardiac conditions:
    

    ECG Images of Myocardial Infarction Patients

    • Number of Images: 240
    • Total Dimensions: 240x12 (total of 2880 images)
    • Description: These images are from patients diagnosed with myocardial infarction (MI), commonly known as a heart attack. The images reflect the ECG patterns typically associated with this critical condition.

    ECG Images of Patients with Abnormal Heartbeat

    • Number of Images: 233
    • Total Dimensions: 233x12 (total of 2796 images)
    • Description: This category includes ECG images from patients exhibiting abnormal heartbeat patterns. Such patterns may indicate a range of arrhythmias or other cardiac issues, providing crucial data for diagnostic and research purposes.

    ECG Images of Patients with a History of Myocardial Infarction

    • Number of Images: 172
    • Total Dimensions: 172x12 (total of 2064 images)
    • Description: These images come from patients who have a documented history of myocardial infarction. They offer insights into the long-term effects and recovery patterns associated with heart attacks.

    Normal Person ECG Images

    • Number of Images: 284
    • Total Dimensions: 284x12 (total of 3408 images)
    • Description: This category features ECG images from individuals with no known cardiac issues, serving as a baseline for comparison with pathological cases.

    Certainly! Here’s a revised and enhanced description for your Kaggle dataset post, with the requested information removed:

    ECG Images Dataset of Cardiac Patients Description The ECG Images Dataset of Cardiac Patients is an extensive collection of electrocardiogram (ECG) images designed to aid research and advancements in the field of cardiovascular medicine. This dataset provides a wealth of data that can be utilized for various analyses, including the development of diagnostic tools and the study of different cardiac conditions.

    Dataset Overview The dataset is organized into four main categories, each representing different cardiac conditions:

    ECG Images of Myocardial Infarction Patients

    Number of Images: 240 Total Dimensions: 240x12 (total of 2880 images) Description: These images are from patients diagnosed with myocardial infarction (MI), commonly known as a heart attack. The images reflect the ECG patterns typically associated with this critical condition. ECG Images of Patients with Abnormal Heartbeat

    Number of Images: 233 Total Dimensions: 233x12 (total of 2796 images) Description: This category includes ECG images from patients exhibiting abnormal heartbeat patterns. Such patterns may indicate a range of arrhythmias or other cardiac issues, providing crucial data for diagnostic and research purposes. ECG Images of Patients with a History of Myocardial Infarction

    Number of Images: 172 Total Dimensions: 172x12 (total of 2064 images) Description: These images come from patients who have a documented history of myocardial infarction. They offer insights into the long-term effects and recovery patterns associated with heart attacks. Normal Person ECG Images

    Number of Images: 284 Total Dimensions: 284x12 (total of 3408 images) Description: This category features ECG images from individuals with no known cardiac issues, serving as a baseline for comparison with pathological cases.

    Applications

    The ECG Images Dataset is a valuable resource for various applications, including:

    • Machine Learning and AI Models: Train and validate models for ECG classification, anomaly detection, and predictive analytics.
    • Cardiac Research: Investigate patterns and features of different cardiac conditions to improve diagnostic methods and patient outcomes.
    • Diagnostic Tool Development: Create automated systems for detecting and interpreting ECG abnormalities.

    Download

    The dataset is available for download from Kaggle and is provided in a compressed file of approximately 194 MB.
    
  12. f

    Overview of the literature.

    • plos.figshare.com
    xls
    Updated Aug 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rivalani Hlongwane; Kutlwano Ramabao; Wilson Mongwe (2024). Overview of the literature. [Dataset]. http://doi.org/10.1371/journal.pone.0308718.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Rivalani Hlongwane; Kutlwano Ramabao; Wilson Mongwe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.

  13. f

    Dataset overview.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amédée Roy; Sophie Lanco Bertrand; Ronan Fablet (2023). Dataset overview. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009890.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Amédée Roy; Sophie Lanco Bertrand; Ronan Fablet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General statistics on the four linearly-interpolated datasets used in this study. (m ± s) is for respectively mean and standard deviation.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research [Dataset]. https://catalog.data.gov/dataset/development-of-the-intelligence-and-machine-learning-tame-toolkit-for-introductory-data-sc
Organization logo

Data from: Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research

Related Article
Explore at:
Dataset updated
Oct 31, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).

Search
Clear search
Close search
Google apps
Main menu