100+ datasets found
  1. MASC dataset

    • kaggle.com
    zip
    Updated Nov 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ali.a.ahmed (2023). MASC dataset [Dataset]. https://www.kaggle.com/datasets/alihmed/masc-dataset
    Explore at:
    zip(608346649 bytes)Available download formats
    Dataset updated
    Nov 19, 2023
    Authors
    ali.a.ahmed
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The MASC dataset is the foundation for developing machine-learning models to detect and classify these screen types. Some of advantages of the MASC dataset, composed of mobile application screens that I have collected, can be summarized as follows: • Large and Diverse App Screen Sample: The MASC dataset includes over 7,000 unique mobile app screens from various apps and activity types, so it can support the development of robust ML models for mobile app screen classification. and it can also serve as a benchmark for developing and evaluating new ML models in this domain. • Realistic Data: Screens collected from actual Android apps via the Rico platform represent real-world designs, aiding models' generalization to real apps. • Improved App Accessibility: Identifying common screen patterns can offer insights to enhance accessibility features, benefiting users with disabilities. • Enhanced User Experience: Understanding mobile app screen types can lead to better user interface design, improving the overall user experience. • many potential applications can be created using the MASC Dataset. These applications include UI captioning and semantic tagging, user-friendly designs with explanations, intelligent tutorials, and enhanced design search features.

  2. w

    Dataset of books called Machine learning in Java : helpful techniques to...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Machine learning in Java : helpful techniques to design, build, and deploy powerful machine learning applications in Java [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Machine+learning+in+Java+%3A+helpful+techniques+to+design%2C+build%2C+and+deploy+powerful+machine+learning+applications+in+Java
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Machine learning in Java : helpful techniques to design, build, and deploy powerful machine learning applications in Java. It features 7 columns including author, publication date, language, and book publisher.

  3. Massachusetts Buildings Dataset

    • kaggle.com
    zip
    Updated Sep 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balraj Ashwath (2020). Massachusetts Buildings Dataset [Dataset]. https://www.kaggle.com/balraj98/massachusetts-buildings-dataset
    Explore at:
    zip(1600758175 bytes)Available download formats
    Dataset updated
    Sep 25, 2020
    Authors
    Balraj Ashwath
    Area covered
    Massachusetts
    Description

    Context

    Building Segmentation from Aerial Imagery is a challenging task. Obstruction from nearby trees, shadows of adjacent buildings, varying texture and color of rooftops, varying shapes and dimensions of buildings are among other challenges that hinder present day models in segmenting sharp building boundaries. High-quality aerial imagery datasets facilitate comparisons of existing methods and lead to increased interest in aerial imagery applications in the machine learning and computer vision communities.

    Content

    The Massachusetts Buildings Dataset consists of 151 aerial images of the Boston area, with each of the images being 1500 × 1500 pixels for an area of 2.25 square kilometers. Hence, the entire dataset covers roughly 340 square kilometers. The data is split into a training set of 137 images, a test set of 10 images and a validation set of 4 images. The target maps were obtained by rasterizing building footprints obtained from the OpenStreetMap project. The data was restricted to regions with an average omission noise level of roughly 5% or less. The large amount of high quality building footprint data was possible to collect because the City of Boston contributed building footprints for the entire city to the OpenStreetMap project. The dataset covers mostly urban and suburban areas and buildings of all sizes, including individual houses and garages, are included in the labels. The datasets make use of imagery released by the state of Massachusetts. All imagery is rescaled to a resolution of 1 pixel per square meter. The target maps for the dataset were generated using data from the OpenStreetMap project. Target maps for the test and validation portions of the dataset were hand-corrected to make the evaluations more accurate.

    Refer this thesis for more information.

    Acknowledgements

    This dataset is derived from Volodymyr Mnih's original Massachusetts Buildings Dataset. Massachusetts Roads Dataset & Massachusetts Buildings dataset were introduced in Chapter 6 of his PhD thesis. If you use this dataset for research purposes you should use the following citation in any resulting publications:

    @phdthesis{MnihThesis, author = {Volodymyr Mnih}, title = {Machine Learning for Aerial Image Labeling}, school = {University of Toronto}, year = {2013} }

    Inspiration

    Rapid advances in Image Understanding using Computer Vision techniques have brought us many state-of-the-art deep learning models across various benchmark datasets. Can we better address the challenges faced by the current models in segmenting buildings from aerial images using the latest methods? Do state-of-the-art methods from other benchmarks work equally well on this data? Does engineering features specific to buildings datasets allow us to build better models?

    Go to Massachusetts Roads Dataset.

  4. d

    80K+ Construction Site Images | AI Training Data | Machine Learning (ML)...

    • datarade.ai
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Seeds, 80K+ Construction Site Images | AI Training Data | Machine Learning (ML) data | Object & Scene Detection | Global Coverage [Dataset]. https://datarade.ai/data-products/50k-construction-site-images-ai-training-data-machine-le-data-seeds
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    Data Seeds
    Area covered
    Russian Federation, United Arab Emirates, Senegal, Guatemala, Tunisia, Swaziland, Grenada, Venezuela (Bolivarian Republic of), Kenya, Peru
    Description

    This dataset features over 80,000 high-quality images of construction sites sourced from photographers worldwide. Built to support AI and machine learning applications, it delivers richly annotated and visually diverse imagery capturing real-world construction environments, machinery, and processes.

    Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data such as aperture, ISO, shutter speed, and focal length. Each image is annotated with construction phase, equipment types, safety indicators, and human activity context—making it ideal for object detection, site monitoring, and workflow analysis. Popularity metrics based on performance on our proprietary platform are also included.

    1. Unique Sourcing Capabilities: images are collected through a proprietary gamified platform, with competitions focused on industrial, construction, and labor themes. Custom datasets can be generated within 72 hours to target specific scenarios, such as building types, stages (excavation, framing, finishing), regions, or safety compliance visuals.

    2. Global Diversity: sourced from contributors in over 100 countries, the dataset reflects a wide range of construction practices, materials, climates, and regulatory environments. It includes residential, commercial, industrial, and infrastructure projects from both urban and rural areas.

    3. High-Quality Imagery: includes a mix of wide-angle site overviews, close-ups of tools and equipment, drone shots, and candid human activity. Resolution varies from standard to ultra-high-definition, supporting both macro and contextual analysis.

    4. Popularity Scores: each image is assigned a popularity score based on its performance in GuruShots competitions. These scores provide insight into visual clarity, engagement value, and human interest—useful for safety-focused or user-facing AI models.

    5. AI-Ready Design: this dataset is structured for training models in real-time object detection (e.g., helmets, machinery), construction progress tracking, material identification, and safety compliance. It’s compatible with standard ML frameworks used in construction tech.

    6. Licensing & Compliance: fully compliant with privacy, labor, and workplace imagery regulations. Licensing is transparent and ready for commercial or research deployment.

    Use Cases: 1. Training AI for safety compliance monitoring and PPE detection. 2. Powering progress tracking and material usage analysis tools. 3. Supporting site mapping, autonomous machinery, and smart construction platforms. 4. Enhancing augmented reality overlays and digital twin models for construction planning.

    This dataset provides a comprehensive, real-world foundation for AI innovation in construction technology, safety, and operational efficiency. Custom datasets are available on request. Contact us to learn more!

  5. US_House_Price_Prediction

    • kaggle.com
    zip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    roopeshk.r (2023). US_House_Price_Prediction [Dataset]. https://www.kaggle.com/datasets/roopeshbharatwajkr/us-house-price-prediction
    Explore at:
    zip(24484 bytes)Available download formats
    Dataset updated
    Jun 29, 2023
    Authors
    roopeshk.r
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    The US property dataset consists of information related to various properties. Each row in the dataset represents a specific property listing. The dataset includes several columns that provide details about the properties.

  6. AI Agent Generating Tool Prompt Library

    • kaggle.com
    zip
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Canstralian (2024). AI Agent Generating Tool Prompt Library [Dataset]. https://www.kaggle.com/datasets/canstralian/web-application-development-ai-prompts
    Explore at:
    zip(4201 bytes)Available download formats
    Dataset updated
    Dec 18, 2024
    Authors
    Canstralian
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for "AI Agent Generating Tool & Debugging Prompt Library" 🤖⚙️

    Dataset Details 📚

    • Dataset Name: AI Agent Generating Tool & Debugging Prompt Library
    • Dataset Description:
      This dataset includes a collection of prompts focused on building and debugging AI-driven tools, including creating self-improving AI agents and debugging prompts for Python projects. The dataset is designed for use in fine-tuning models related to code generation, debugging, and software engineering tasks. It is particularly useful for developers working with AI-powered tools for improving code functionality and accuracy. 🛠️💡

      Key Features:

      • 🤖 Build AI agents capable of self-improvement.
      • ⚙️ Focus on debugging, automated testing, and code optimization.
      • 🔄 Prompts aimed at improving software engineering workflows.

    Files 📁

    • prompts.csv: A CSV file containing a structured collection of prompts for AI agent generation and debugging tasks.
    • prompts.json: The same collection of prompts in JSON format, suitable for machine learning applications.
    • README.md: Overview and additional information about the dataset and its usage.

    Tags 🏷️

    tags: - ai 🤖 - machine learning 🧠 - python 🐍 - repl.it 🖥️ - debugging 🐞 - prompt generation 📝 - automated testing 🧪 - ai agent 🛠️ - self-improving ai 🔄 - code optimization ⚡ - unit testing ✅ - software engineering 👨‍💻 - application projects 🏗️

    Task Categories 🗂️

    tasks: - code generation 💻 - text generation ✍️ - machine translation 🌐 - text summarization 📑 - text classification 🏷️ - question answering ❓ - natural language processing 📚 - software development 👨‍💻 - automated testing 🧪

    Size Category 📏

    size_category: small 📉

    License 📜

    license: CC BY-SA 4.0 ✍️

    Citation 🏆

    If you use this dataset in your research, please cite it as follows:

    AI Agent Generating Tool & Debugging Prompt Library. [Your Name/Organization]. Available at: [Dataset Link] 🌐

  7. S

    Two residential districts datasets from Kielce, Poland for building semantic...

    • scidb.cn
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agnieszka Łysak (2022). Two residential districts datasets from Kielce, Poland for building semantic segmentation task [Dataset]. http://doi.org/10.57760/sciencedb.02955
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Agnieszka Łysak
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Poland, Kielce
    Description

    Today, deep neural networks are widely used in many computer vision problems, also for geographic information systems (GIS) data. This type of data is commonly used for urban analyzes and spatial planning. We used orthophotographic images of two residential districts from Kielce, Poland for research including urban sprawl automatic analysis with Transformer-based neural network application.Orthophotomaps were obtained from Kielce GIS portal. Then, the map was manually masked into building and building surroundings classes. Finally, the ortophotomap and corresponding classification mask were simultaneously divided into small tiles. This approach is common in image data preprocessing for machine learning algorithms learning phase. Data contains two original orthophotomaps from Wietrznia and Pod Telegrafem residential districts with corresponding masks and also their tiled version, ready to provide as a training data for machine learning models.Transformed-based neural network has undergone a training process on the Wietrznia dataset, targeted for semantic segmentation of the tiles into buildings and surroundings classes. After that, inference of the models was used to test model's generalization ability on the Pod Telegrafem dataset. The efficiency of the model was satisfying, so it can be used in automatic semantic building segmentation. Then, the process of dividing the images can be reversed and complete classification mask retrieved. This mask can be used for area of the buildings calculations and urban sprawl monitoring, if the research would be repeated for GIS data from wider time horizon.Since the dataset was collected from Kielce GIS portal, as the part of the Polish Main Office of Geodesy and Cartography data resource, it may be used only for non-profit and non-commertial purposes, in private or scientific applications, under the law "Ustawa z dnia 4 lutego 1994 r. o prawie autorskim i prawach pokrewnych (Dz.U. z 2006 r. nr 90 poz 631 z późn. zm.)". There are no other legal or ethical considerations in reuse potential.Data information is presented below.wietrznia_2019.jpg - orthophotomap of Wietrznia districtmodel's - used for training, as an explanatory imagewietrznia_2019.png - classification mask of Wietrznia district - used for model's training, as a target imagewietrznia_2019_validation.jpg - one image from Wietrznia district - used for model's validation during training phasepod_telegrafem_2019.jpg - orthophotomap of Pod Telegrafem district - used for model's evaluation after training phasewietrznia_2019 - folder with wietrznia_2019.jpg (image) and wietrznia_2019.png (annotation) images, divided into 810 tiles (512 x 512 pixels each), tiles with no information were manually removed, so the training data would contain only informative tilestiles presented - used for the model during training (images and annotations for fitting the model to the data)wietrznia_2019_vaidation - folder with wietrznia_2019_validation.jpg image divided into 16 tiles (256 x 256 pixels each) - tiles were presented to the model during training (images for validation model's efficiency); it was not the part of the training datapod_telegrafem_2019 - folder with pod_telegrafem.jpg image divided into 196 tiles (256 x 265 pixels each) - tiles were presented to the model during inference (images for evaluation model's robustness)Dataset was created as described below.Firstly, the orthophotomaps were collected from Kielce Geoportal (https://gis.kielce.eu). Kielce Geoportal offers a .pst recent map from April 2019. It is an orthophotomap with a resolution of 5 x 5 pixels, constructed from a plane flight at 700 meters over ground height, taken with a camera for vertical photos. Downloading was done by WMS in open-source QGIS software (https://www.qgis.org), as a 1:500 scale map, then converted to a 1200 dpi PNG image.Secondly, the map from Wietrznia residential district was manually labelled, also in QGIS, in the same scope, as the orthophotomap. Annotation based on land cover map information was also obtained from Kielce Geoportal. There are two classes - residential building and surrounding. Second map, from Pod Telegrafem district was not annotated, since it was used in the testing phase and imitates situation, where there is no annotation for the new data presented to the model.Next, the images was converted to an RGB JPG images, and the annotation map was converted to 8-bit GRAY PNG image.Finally, Wietrznia data files were tiled to 512 x 512 pixels tiles, in Python PIL library. Tiles with no information or a relatively small amount of information (only white background or mostly white background) were manually removed. So, from the 29113 x 15938 pixels orthophotomap, only 810 tiles with corresponding annotations were left, ready to train the machine learning model for the semantic segmentation task. Pod Telegrafem orthophotomap was tiled with no manual removing, so from the 7168 x 7168 pixels ortophotomap were created 197 tiles with 256 x 256 pixels resolution. There was also image of one residential building, used for model's validation during training phase, it was not the part of the training data, but was a part of Wietrznia residential area. It was 2048 x 2048 pixel ortophotomap, tiled to 16 tiles 256 x 265 pixels each.

  8. d

    200K+ Landmark Images | AI Training Data | Annotated imagery data for AI |...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Seeds, 200K+ Landmark Images | AI Training Data | Annotated imagery data for AI | Object & Scene Detection | Global Coverage [Dataset]. https://datarade.ai/data-products/120k-landmark-images-ai-training-data-annotated-imagery-data-seeds
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    Data Seeds
    Area covered
    French Guiana, Rwanda, Jamaica, Kosovo, Egypt, Lesotho, Wallis and Futuna, Greece, American Samoa, Niger
    Description

    This dataset features over 200,000 high-quality images of historical and cultural landmarks sourced from photographers worldwide. Designed to support AI and machine learning applications, it provides a diverse and richly annotated collection of landmark imagery.

    Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data, detailing camera settings such as aperture, ISO, shutter speed, and focal length. Additionally, each image is pre-annotated with object and scene detection metadata, making it ideal for tasks like classification, detection, and segmentation. Popularity metrics, derived from engagement on our proprietary platform, are also included.

    1. Unique Sourcing Capabilities: the images are collected through a proprietary gamified platform for photographers. Competitions focused on landmark photography ensure fresh, relevant, and high-quality submissions. Custom datasets can be sourced on-demand within 72 hours, allowing for specific requirements such as particular landmarks or geographic regions to be met efficiently.

    2. Global Diversity: photographs have been sourced from contributors in over 100 countries, ensuring a vast array of landmark types and cultural contexts. The images feature varied settings, including historical monuments, iconic structures, natural landmarks, and urban architecture, providing an unparalleled level of diversity.

    3. High-Quality Imagery: the dataset includes images with resolutions ranging from standard to high-definition to meet the needs of various projects. Both professional and amateur photography styles are represented, offering a mix of artistic and practical perspectives suitable for a variety of applications.

    4. Popularity Scores: each image is assigned a popularity score based on its performance in GuruShots competitions. This unique metric reflects how well the image resonates with a global audience, offering an additional layer of insight for AI models focused on user preferences or engagement trends.

    5. AI-Ready Design: this dataset is optimized for AI applications, making it ideal for training models in tasks such as image recognition, classification, and segmentation. It is compatible with a wide range of machine learning frameworks and workflows, ensuring seamless integration into your projects.

    6. Licensing & Compliance: the dataset complies fully with data privacy regulations and offers transparent licensing for both commercial and academic use.

    Use Cases: 1. Training AI systems for landmark recognition and geolocation. 2. Enhancing navigation and travel AI applications. 3. Building datasets for educational, tourism, and augmented reality tools. 4.Supporting cultural heritage preservation and analysis through AI-powered solutions.

    This dataset offers a comprehensive, diverse, and high-quality resource for training AI and ML models, tailored to deliver exceptional performance for your projects. Customizations are available to suit specific project needs. Contact us to learn more!

  9. m

    FruitNet: Indian Fruits Dataset with quality (Good, Bad & Mixed quality)

    • data.mendeley.com
    Updated Mar 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kailas PATIL (2022). FruitNet: Indian Fruits Dataset with quality (Good, Bad & Mixed quality) [Dataset]. http://doi.org/10.17632/b6fftwbr2v.3
    Explore at:
    Dataset updated
    Mar 8, 2022
    Authors
    Kailas PATIL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High quality images of fruits are required to solve fruit classification and recognition problem. To build the machine learning models, neat and clean dataset is the elementary requirement. With this objective we have created the dataset of six popular Indian fruits named as “FruitNet”. This dataset consists of 14700+ high-quality images of 6 different classes of fruits in the processed format. The images are divided into 3 sub-folders 1) Good quality fruits 2) Bad quality fruits and 3) Mixed quality fruits. Each sub-folder contains the 6 fruits images i.e. apple, banana, guava, lime, orange, and pomegranate. Mobile phone with a high-end resolution camera was used to capture the images. The images were taken at the different backgrounds and in different lighting conditions. The proposed dataset can be used for training, testing and validation of fruit classification or reorganization model.

    [The related article is available at: https://www.sciencedirect.com/science/article/pii/S2352340921009616. Cite the article as : V. Meshram, K. Patil, FruitNet: Indian fruits image dataset with quality for machine learning applications, Data in Brief, Volume 40, 2022, 107686, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2021.107686 ]

  10. DataSheet1_Systematic Review of Deep Learning and Machine Learning for...

    • frontiersin.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sina Ardabili; Leila Abdolalizadeh; Csaba Mako; Bernat Torok; Amir Mosavi (2023). DataSheet1_Systematic Review of Deep Learning and Machine Learning for Building Energy.docx [Dataset]. http://doi.org/10.3389/fenrg.2022.786027.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Sina Ardabili; Leila Abdolalizadeh; Csaba Mako; Bernat Torok; Amir Mosavi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The building energy (BE) management plays an essential role in urban sustainability and smart cities. Recently, the novel data science and data-driven technologies have shown significant progress in analyzing the energy consumption and energy demand datasets for a smarter energy management. The machine learning (ML) and deep learning (DL) methods and applications, in particular, have been promising for the advancement of accurate and high-performance energy models. The present study provides a comprehensive review of ML- and DL-based techniques applied for handling BE systems, and it further evaluates the performance of these techniques. Through a systematic review and a comprehensive taxonomy, the advances of ML and DL-based techniques are carefully investigated, and the promising models are introduced. According to the results obtained for energy demand forecasting, the hybrid and ensemble methods are located in the high-robustness range, SVM-based methods are located in good robustness limitation, ANN-based methods are located in medium-robustness limitation, and linear regression models are located in low-robustness limitations. On the other hand, for energy consumption forecasting, DL-based, hybrid, and ensemble-based models provided the highest robustness score. ANN, SVM, and single ML models provided good and medium robustness, and LR-based models provided a lower robustness score. In addition, for energy load forecasting, LR-based models provided the lower robustness score. The hybrid and ensemble-based models provided a higher robustness score. The DL-based and SVM-based techniques provided a good robustness score, and ANN-based techniques provided a medium robustness score.

  11. d

    160K+ Pattern Images | AI Training Data | Annotated imagery data for AI |...

    • datarade.ai
    Updated Dec 8, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Seeds (2019). 160K+ Pattern Images | AI Training Data | Annotated imagery data for AI | Object & Scene Detection | Global Coverage [Dataset]. https://datarade.ai/data-products/100k-pattern-images-ai-training-data-annotated-imagery-d-data-seeds
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 8, 2019
    Dataset authored and provided by
    Data Seeds
    Area covered
    Morocco, Uruguay, Singapore, Faroe Islands, Guyana, Lesotho, Timor-Leste, Netherlands, Sint Eustatius and Saba, Micronesia (Federated States of)
    Description

    This dataset features over 160,000 high-quality images of patterns sourced from photographers worldwide. Designed to support AI and machine learning applications, it provides a diverse and richly annotated collection of pattern imagery.

    Key Features: 1. Comprehensive Metadata The dataset includes full EXIF data, detailing camera settings such as aperture, ISO, shutter speed, and focal length. Additionally, each image is pre-annotated with object and scene detection metadata, making it ideal for tasks like classification, detection, and segmentation. Popularity metrics, derived from engagement on our proprietary platform, are also included.

    1. Unique Sourcing Capabilities The images are collected through a proprietary gamified platform for photographers. Competitions focused on pattern photography ensure fresh, relevant, and high-quality submissions. Custom datasets can be sourced on-demand within 72 hours, allowing for specific requirements such as particular pattern types (e.g., geometric, organic, textile) or stylistic preferences to be met efficiently.

    2. Global Diversity Photographs have been sourced from contributors in over 100 countries, ensuring a vast array of visual patterns captured in various cultural, architectural, and natural contexts. The images feature varied environments, including fabric textures, wallpapers, cityscapes, fractals, and abstract art, offering a rich visual spectrum for training and analysis.

    3. High-Quality Imagery The dataset includes images with resolutions ranging from standard to high-definition to meet the needs of various projects. Both professional and amateur photography styles are represented, offering a mix of artistic and practical perspectives suitable for a variety of applications.

    4. Popularity Scores Each image is assigned a popularity score based on its performance in GuruShots competitions. This unique metric reflects how well the image resonates with a global audience, offering an additional layer of insight for AI models focused on user preferences or engagement trends.

    5. AI-Ready Design This dataset is optimized for AI applications, making it ideal for training models in tasks such as pattern recognition, style classification, and image generation. It is compatible with a wide range of machine learning frameworks and workflows, ensuring seamless integration into your projects.

    6. Licensing & Compliance The dataset complies fully with data privacy regulations and offers transparent licensing for both commercial and academic use.

    Use Cases: 1. Training AI systems for visual pattern recognition and classification. 2. Enhancing fashion and interior design models through textile and decorative pattern analysis. 3. Building datasets for generative models and style transfer applications. 4. Supporting research in visual perception, cultural studies, and computational aesthetics.

    This dataset offers a comprehensive, diverse, and high-quality resource for training AI and ML models, tailored to deliver exceptional performance for your projects. Customizations are available to suit specific project needs. Contact us to learn more!

  12. h

    Classification of structural building damage grades from multi-temporal...

    • heidata.uni-heidelberg.de
    text/x-python, zip
    Updated Jul 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivien Zahs; Vivien Zahs; Katharina Anders; Julia Kohns; Alexander Stark; Bernhard Höfle; Bernhard Höfle; Katharina Anders; Julia Kohns; Alexander Stark (2023). Classification of structural building damage grades from multi-temporal photogrammetric point clouds using a machine learning model trained on virtual laser scanning data [Data and Source Code] [Dataset]. http://doi.org/10.11588/DATA/D3WZID
    Explore at:
    zip(121042315), text/x-python(3843), text/x-python(2422), zip(284705851), text/x-python(4299), zip(10181624901), zip(879751204)Available download formats
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    heiDATA
    Authors
    Vivien Zahs; Vivien Zahs; Katharina Anders; Julia Kohns; Alexander Stark; Bernhard Höfle; Bernhard Höfle; Katharina Anders; Julia Kohns; Alexander Stark
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/D3WZIDhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/D3WZID

    Description

    Automatic damage assessment by analysing UAV-derived 3D point clouds provides fast information on the damage situation after an earthquake. However, the assessment of different damage grades is challenging given the variety in damage characteristics and limited transferability of methods to other geographic regions or data sources. We present a novel change-based approach to automatically assess multi-class building damage from real-world point clouds using a machine learning model trained on virtual laser scanning (VLS) data. Therein, we (1) identify object-specific point cloud-based change features, (2) extract changed building parts using k-means clustering, (3) train a random forest machine learning model with VLS data based on object-specific change features, and (4) use the classifier to assess building damage in real-world photogrammetric point clouds. We evaluate the classifier with respect to its capacity to classify three damage grades (heavy, extreme, destruction) in pre-event and post-event point clouds of an earthquake in L’Aquila (Italy). Using object-specific change features derived from bi-temporal point clouds, our approach is transferable with respect to multi-source input point clouds used for model training (VLS) and application (real-world photogrammetry). We further achieve geographic transferability by using simulated training data which characterises damage grades across different geographic regions. The model yields high multi-target classification accuracies (overall accuracy: 92.0%–95.1%). Classification performance improves only slightly when using real-world region-specific training data (3% higher overall accuracies). We consider our approach especially relevant for applications where timely information on the damage situation is required and sufficient real-world training data is not available. This dataset includes 3D building models (building_models.zip) representing the target damage grades (no damage, heavy damage, extreme damage, destruction) of this study Python source code (code.zip) used in this study to (1) generate simulated multi-temporal 3D point clouds using HELIOS++ (https://github.com/3dgeo-heidelberg/helios), (2) extract damaged building parts using k-means clustering, (3) compute object-specific geometric change features per building (4) train a multi-target random forest classifier to classify buildings into four damage grades based on object-specific change features.

  13. d

    2.7M+ Flower Images | AI Training Data | Annotated imagery data for AI |...

    • datarade.ai
    Updated Aug 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Seeds (2019). 2.7M+ Flower Images | AI Training Data | Annotated imagery data for AI | Object & Scene Detection | Global Coverage [Dataset]. https://datarade.ai/data-products/1-5m-flower-images-ai-training-data-annotated-imagery-da-data-seeds
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Aug 28, 2019
    Dataset authored and provided by
    Data Seeds
    Area covered
    Norfolk Island, Montenegro, Slovakia, Guatemala, Åland Islands, United Kingdom, Fiji, French Southern Territories, Kazakhstan, Virgin Islands (U.S.)
    Description

    This dataset features over 2,700,000 high-quality images of flowers sourced from photographers worldwide. Designed to support AI and machine learning applications, it provides a diverse and richly annotated collection of flower imagery.

    Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data, detailing camera settings such as aperture, ISO, shutter speed, and focal length. Additionally, each image is pre-annotated with object and scene detection metadata, making it ideal for tasks like classification, detection, and segmentation. Popularity metrics, derived from engagement on our proprietary platform, are also included.

    1. Unique Sourcing Capabilities: the images are collected through a proprietary gamified platform for photographers. Competitions focused on flower photography ensure fresh, relevant, and high-quality submissions. Custom datasets can be sourced on-demand within 72 hours, allowing for specific requirements such as particular flower species or geographic regions to be met efficiently.

    2. Global Diversity: photographs have been sourced from contributors in over 100 countries, ensuring a vast array of flower species, colors, and environmental settings. The images feature varied contexts, including natural habitats, gardens, bouquets, and urban landscapes, providing an unparalleled level of diversity.

    3. High-Quality Imagery: the dataset includes images with resolutions ranging from standard to high-definition to meet the needs of various projects. Both professional and amateur photography styles are represented, offering a mix of artistic and practical perspectives suitable for a variety of applications.

    4. Popularity Scores Each image is assigned a popularity score based on its performance in GuruShots competitions. This unique metric reflects how well the image resonates with a global audience, offering an additional layer of insight for AI models focused on user preferences or engagement trends.

    5. I-Ready Design: this dataset is optimized for AI applications, making it ideal for training models in tasks such as image recognition, classification, and segmentation. It is compatible with a wide range of machine learning frameworks and workflows, ensuring seamless integration into your projects.

    6. Licensing & Compliance: the dataset complies fully with data privacy regulations and offers transparent licensing for both commercial and academic use.

    Use Cases 1. Training AI systems for plant recognition and classification. 2. Enhancing agricultural AI models for plant health assessment and species identification. 3. Building datasets for educational tools and augmented reality applications. 4. Supporting biodiversity and conservation research through AI-powered analysis.

    This dataset offers a comprehensive, diverse, and high-quality resource for training AI and ML models, tailored to deliver exceptional performance for your projects. Customizations are available to suit specific project needs. Contact us to learn more!

  14. d

    25M+ Images | AI Training Data | Annotated imagery data for AI | Object &...

    • datarade.ai
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Seeds, 25M+ Images | AI Training Data | Annotated imagery data for AI | Object & Scene Detection | Global Coverage [Dataset]. https://datarade.ai/data-products/15m-images-ai-training-data-annotated-imagery-data-for-a-data-seeds
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    Data Seeds
    Area covered
    Cabo Verde, Honduras, Bulgaria, Iraq, Macedonia (the former Yugoslav Republic of), Venezuela (Bolivarian Republic of), Botswana, United Republic of, China, Sierra Leone
    Description

    This dataset features over 25,000,000 high-quality general-purpose images sourced from photographers worldwide. Designed to support a wide range of AI and machine learning applications, it offers a richly diverse and extensively annotated collection of everyday visual content.

    Key Features: 1. Comprehensive Metadata: the dataset includes full EXIF data, detailing camera settings such as aperture, ISO, shutter speed, and focal length. Additionally, each image is pre-annotated with object and scene detection metadata, making it ideal for tasks like classification, detection, and segmentation. Popularity metrics, derived from engagement on our proprietary platform, are also included.

    2.Unique Sourcing Capabilities: the images are collected through a proprietary gamified platform for photographers. Competitions spanning various themes ensure a steady influx of diverse, high-quality submissions. Custom datasets can be sourced on-demand within 72 hours, allowing for specific requirements—such as themes, subjects, or scenarios—to be met efficiently.

    1. Global Diversity: photographs have been sourced from contributors in over 100 countries, covering a wide range of human experiences, cultures, environments, and activities. The dataset includes images of people, nature, objects, animals, urban and rural life, and more—captured across different times of day, seasons, and lighting conditions.

    2. High-Quality Imagery: the dataset includes images with resolutions ranging from standard to high-definition to meet the needs of various projects. Both professional and amateur photography styles are represented, offering a balance of realism and creativity across visual domains.

    3. Popularity Scores: each image is assigned a popularity score based on its performance in GuruShots competitions. This unique metric reflects how well the image resonates with a global audience, offering an additional layer of insight for AI models focused on aesthetics, engagement, or content curation.

    4. AI-Ready Design: this dataset is optimized for AI applications, making it ideal for training models in general image recognition, multi-label classification, content filtering, and scene understanding. It integrates easily with leading machine learning frameworks and pipelines.

    5. Licensing & Compliance: the dataset complies fully with data privacy regulations and offers transparent licensing for both commercial and academic use.

    Use Cases: 1. Training AI models for general-purpose image classification and tagging. 2. Enhancing content moderation and visual search systems. 3. Building foundational datasets for large-scale vision-language models. 4. Supporting research in computer vision, multimodal AI, and generative modeling.

    This dataset offers a comprehensive, diverse, and high-quality resource for training AI and ML models across a wide array of domains. Customizations are available to suit specific project needs. Contact us to learn more!

  15. Job Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
    Explore at:
    zip(479575920 bytes)Available download formats
    Dataset updated
    Sep 17, 2023
    Authors
    Ravender Singh Rana
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Dataset

    This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

    Descriptions for each of the columns in the dataset:

    1. Job Id: A unique identifier for each job posting.
    2. Experience: The required or preferred years of experience for the job.
    3. Qualifications: The educational qualifications needed for the job.
    4. Salary Range: The range of salaries or compensation offered for the position.
    5. Location: The city or area where the job is located.
    6. Country: The country where the job is located.
    7. Latitude: The latitude coordinate of the job location.
    8. Longitude: The longitude coordinate of the job location.
    9. Work Type: The type of employment (e.g., full-time, part-time, contract).
    10. Company Size: The approximate size or scale of the hiring company.
    11. Job Posting Date: The date when the job posting was made public.
    12. Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)
    13. Contact Person: The name of the contact person or recruiter for the job.
    14. Contact: Contact information for job inquiries.
    15. Job Title: The job title or position being advertised.
    16. Role: The role or category of the job (e.g., software developer, marketing manager).
    17. Job Portal: The platform or website where the job was posted.
    18. Job Description: A detailed description of the job responsibilities and requirements.
    19. Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).
    20. Skills: The skills or qualifications required for the job.
    21. Responsibilities: Specific responsibilities and duties associated with the job.
    22. Company Name: The name of the hiring company.
    23. Company Profile: A brief overview of the company's background and mission.

    Potential Use Cases:

    • Building predictive models to forecast job market trends.
    • Enhancing job recommendation systems for job seekers.
    • Developing NLP models for resume parsing and job matching.
    • Analyzing regional job market disparities and opportunities.
    • Exploring salary prediction models for various job roles.

    Acknowledgements:

    We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

    Note:

    Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com

  16. i

    ARKOMA: The Dataset to Build Neural Networks-Based Inverse Kinematics for...

    • ieee-dataport.org
    • data.mendeley.com
    Updated Nov 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arif Nugroho (2023). ARKOMA: The Dataset to Build Neural Networks-Based Inverse Kinematics for NAO Robot Arms [Dataset]. https://ieee-dataport.org/documents/arkoma-dataset-build-neural-networks-based-inverse-kinematics-nao-robot-arms
    Explore at:
    Dataset updated
    Nov 13, 2023
    Authors
    Arif Nugroho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    validation dataset

  17. Arabic Speech Commands Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Apr 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulkader Ghandoura; Abdulkader Ghandoura (2021). Arabic Speech Commands Dataset [Dataset]. http://doi.org/10.5281/zenodo.4662481
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 5, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdulkader Ghandoura; Abdulkader Ghandoura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arabic Speech Commands Dataset

    This dataset is designed to help train simple machine learning models that serve educational and research purposes in the speech recognition domain, mainly for keyword spotting tasks.

    Dataset Description

    Our dataset is a list of pairs (x, y), where x is the input speech signal, and y is the corresponding keyword. The final dataset consists of 12000 such pairs, comprising 40 keywords. Each audio file is one-second in length sampled at 16 kHz. We have 30 participants, each of them recorded 10 utterances for each keyword. Therefore, we have 300 audio files for each keyword in total (30 * 10 * 40 = 12000), and the total size of all the recorded keywords is ~384 MB. The dataset also contains several background noise recordings we obtained from various natural sources of noise. We saved these audio files in a separate folder with the name background_noise and a total size of ~49 MB.

    Dataset Structure

    There are 40 folders, each of which represents one keyword and contains 300 files. The first eight digits of each file name identify the contributor, while the last two digits identify the round number. For example, the file path rotate/00000021_NO_06.wav indicates that the contributor with the ID 00000021 pronounced the keyword rotate for the 6th time.

    Data Split

    We recommend using the provided CSV files in your experiments. We kept 60% of the dataset for training, 20% for validation, and the remaining 20% for testing. In our split method, we guarantee that all recordings of a certain contributor are within the same subset.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. For more details, see the LICENSE file in this folder.

    Citations

    If you want to use the Arabic Speech Commands dataset in your work, please cite it as:

    @article{arabicspeechcommandsv1,
       author = {Ghandoura, Abdulkader and Hjabo, Farouk and Al Dakkak, Oumayma},
       title = {Building and Benchmarking an Arabic Speech Commands Dataset for Small-Footprint Keyword Spotting},
       journal = {Engineering Applications of Artificial Intelligence},
       year = {2021},
       publisher={Elsevier}
    }

  18. f

    QFlow lite dataset: A machine-learning approach to the charge states in...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justyna P. Zwolak; Sandesh S. Kalantre; Xingyao Wu; Stephen Ragole; Jacob M. Taylor (2023). QFlow lite dataset: A machine-learning approach to the charge states in quantum dot experiments [Dataset]. http://doi.org/10.1371/journal.pone.0205844
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Justyna P. Zwolak; Sandesh S. Kalantre; Xingyao Wu; Stephen Ragole; Jacob M. Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundOver the past decade, machine learning techniques have revolutionized how research and science are done, from designing new materials and predicting their properties to data mining and analysis to assisting drug discovery to advancing cybersecurity. Recently, we added to this list by showing how a machine learning algorithm (a so-called learner) combined with an optimization routine can assist experimental efforts in the realm of tuning semiconductor quantum dot (QD) devices. Among other applications, semiconductor quantum dots are a candidate system for building quantum computers. In order to employ QDs, one needs to tune the devices into a desirable configuration suitable for quantum computing. While current experiments adjust the control parameters heuristically, such an approach does not scale with the increasing size of the quantum dot arrays required for even near-term quantum computing demonstrations. Establishing a reliable protocol for tuning QD devices that does not rely on the gross-scale heuristics developed by experimentalists is thus of great importance.Materials and methodsTo implement the machine learning-based approach, we constructed a dataset of simulated QD device characteristics, such as the conductance and the charge sensor response versus the applied electrostatic gate voltages. The gate voltages are the experimental ‘knobs’ for tuning the device into useful regimes. Here, we describe the methodology for generating the dataset, as well as its validation in training convolutional neural networks.Results and discussionFrom 200 training sets sampled randomly from the full dataset, we show that the learner’s accuracy in recognizing the state of a device is ≈ 96.5% when using either current-based or charge-sensor-based training. The spread in accuracy over our 200 training sets is 0.5% and 1.8% for current- and charge-sensor-based data, respectively. In addition, we also introduce a tool that enables other researchers to use this approach for further research: QFlow lite—a Python-based mini-software suite that uses the dataset to train neural networks to recognize the state of a device and differentiate between states in experimental data. This work gives the definitive reference for the new dataset that will help enable researchers to use it in their experiments or to develop new machine learning approaches and concepts.

  19. Architectural Design Decisions for the Machine Learning Workflow: Dataset...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen John Warnett; Stephen John Warnett; Uwe Zdun; Uwe Zdun (2024). Architectural Design Decisions for the Machine Learning Workflow: Dataset and Code [Dataset]. http://doi.org/10.5281/zenodo.5730291
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephen John Warnett; Stephen John Warnett; Uwe Zdun; Uwe Zdun
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    Title: Architectural Design Decisions for the Machine Learning Workflow: Dataset and Code

    Authors: Stephen John Warnett; Uwe Zdun

    About: This is the dataset and code artifact for the article entitled "Architectural Design Decisions for the Machine Learning Workflow".

    Contents: The "_generated" directory contains the generated results, including latex files with tables for use in publications and the Architectural Design Decision model in textual and graphical form. "Generators" contains Python applications that can be run to generate the above. "Metamodels" contains a Python file with type definitions. "Sources_coding" contains our source codings and audit trail. "Add_models" contains the Python implementation of our model and source codings. Finally, "appendix" contains a detailed description of our research method.

    Article Abstract: Bringing machine learning models to production is challenging as it is often fraught with uncertainty and confusion, partially due to the disparity between software engineering and machine learning practices, but also due to knowledge gaps on the level of the individual practitioner. We conducted a qualitative investigation into the architectural decisions faced by practitioners as documented in gray literature based on Straussian Grounded Theory and modeled current practices in machine learning. Our novel Architectural Design Decision model is based on current practitioner understanding of the topic and helps bridge the gap between science and practice, foster scientific understanding of the subject, and support practitioners via the integration and consolidation of the myriad decisions they face. We describe a subset of the Architectural Design Decisions that were modeled, discuss uses for the model, and outline areas in which further research may be pursued.

    Objective: This article aims to study current practitioner understanding of architectural concepts associated with data processing, model building, and Automated Machine Learning (AutoML) within the context of the machine learning workflow.

    Method: Applying Straussian Grounded Theory to gray literature sources containing practitioner views on machine learning practices, we studied methods and techniques currently applied by practitioners in the context of machine learning solution development and gained valuable insights into the software engineering and architectural state of the art as applied to ML.

    Results: Our study resulted in a model of Architectural Design Decisions, practitioner practices, and decision drivers in the field of software engineering and software architecture for machine learning.

    Conclusions: The resulting Architectural Design Decisions model can help researchers better understand practitioners' needs and the challenges they face, and guide their decisions based on existing practices. The study also opens new avenues for further research in the field, and the design guidance provided by our model can also help reduce design effort and risk. In future work, we plan on using our findings to provide automated design advice to machine learning engineers.

  20. recruitment dataset

    • kaggle.com
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surendra Kumar Nellore (2025). recruitment dataset [Dataset]. https://www.kaggle.com/datasets/surendra365/recruitement-dataset
    Explore at:
    zip(1658648 bytes)Available download formats
    Dataset updated
    Feb 17, 2025
    Authors
    Surendra Kumar Nellore
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    In today’s competitive job market, companies receive numerous applications for each job posting, making it challenging to efficiently screen and shortlist candidates. This dataset is designed to facilitate research and development in resume screening, job matching, and recruitment analytics. It can be used to build machine learning models for applicant-job matching, automate resume parsing, and analyze hiring trends.

    Dataset Overview

    This dataset contains applicant details, resumes, job descriptions, and matching labels to assess how well a candidate fits a specific job role. It can be used to explore factors affecting job selection, identify biases in hiring, and improve applicant tracking systems.

    Data Sources & Collection

    The dataset was compiled from synthetic and publicly available job application data. It is structured to resemble real-world hiring scenarios, making it useful for data science and HR analytics projects. The resumes and job descriptions are either anonymized, synthesized, or derived from publicly accessible recruitment data.

    Columns Description

    Job Applicant Name – Full name of the applicant. Age – Applicant’s age. Gender – Applicant’s gender identity. Race – Racial background of the applicant. Ethnicity – Ethnic identity of the applicant. Resume – Text content of the applicant’s resume, including skills, experience, and education. Job Roles – The job positions for which the applicant applied. Job Description – A detailed description of the job role, including required skills, responsibilities, and qualifications. Best Match – A label or score indicating how well the applicant matches the job role based on qualifications and experience.

    Inspiration & Use Cases

    This dataset is useful for: ✅ Building AI-powered resume-screening models to automate candidate selection. ✅ Developing job recommendation systems that suggest the best roles for applicants. ✅ Analyzing hiring trends & biases in recruitment based on age, gender, or ethnicity. ✅ Training NLP models for resume parsing and job description understanding.

    Potential Applications

    AI-based Applicant Tracking Systems (ATS) HR Analytics & Hiring Bias Studies Resume-Job Matching Algorithms Data-Driven Career Counseling

    🚀 We encourage data scientists, recruiters, and HR tech enthusiasts to explore this dataset and build innovative solutions!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ali.a.ahmed (2023). MASC dataset [Dataset]. https://www.kaggle.com/datasets/alihmed/masc-dataset
Organization logo

MASC dataset

A Mobile App Dataset for Building Classification Applications

Explore at:
zip(608346649 bytes)Available download formats
Dataset updated
Nov 19, 2023
Authors
ali.a.ahmed
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

The MASC dataset is the foundation for developing machine-learning models to detect and classify these screen types. Some of advantages of the MASC dataset, composed of mobile application screens that I have collected, can be summarized as follows: • Large and Diverse App Screen Sample: The MASC dataset includes over 7,000 unique mobile app screens from various apps and activity types, so it can support the development of robust ML models for mobile app screen classification. and it can also serve as a benchmark for developing and evaluating new ML models in this domain. • Realistic Data: Screens collected from actual Android apps via the Rico platform represent real-world designs, aiding models' generalization to real apps. • Improved App Accessibility: Identifying common screen patterns can offer insights to enhance accessibility features, benefiting users with disabilities. • Enhanced User Experience: Understanding mobile app screen types can lead to better user interface design, improving the overall user experience. • many potential applications can be created using the MASC Dataset. These applications include UI captioning and semantic tagging, user-friendly designs with explanations, intelligent tutorials, and enhanced design search features.

Search
Clear search
Close search
Google apps
Main menu