MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Yolo Find Text is a dataset for object detection tasks - it contains Text annotations for 290 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Educational Tools: This computer vision model can be integrated into educational software, assisting students in learning numerical and symbolic comprehension. For instance, recognizing and categorizing handwritten digits and symbols to facilitate learning mathematics.
Optical Character Recognition (OCR): This model can be used to recognize and categorize digits and symbols in scanned documents or photos, aiding in digitization and data extraction purposes.
Handwriting Recognition Systems: It can be applied in handwriting recognition systems to identify and categorize unique handwritten digits or characters, supporting automated evaluation or data entry.
Accessibility Applications: It can support the creation of tools for visually impaired individuals, by recognizing text and symbols in physical documents and producing spoken output.
Automated Testing Applications: In a testing or examination scenario, the model can automatically grade multiple-choice tests or quizzes by recognizing and categorizing filled answer bubbles or handwritten digits/symbols.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset comprises a total of 190 images used to create a license plate recognition model for Saudi Arabian number plates. First, we create bounding box annotations for Saudi Arabian license plate images to train an object detection model using YOLO Bounding Box Annotation Tool (YBAT). The annotations are saved as .xml files. https://youtu.be/k-d1OFHeikg Then, we implement a Faster R-CNN model with a ResNet-50 backbone using PyTorch. The model is trained to detect and localize various components of the license plate, including Arabic and Latin characters, numbers, and the KSA logo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Game Character Customization: Utilize the "game" computer vision model to recognize and differentiate between player characters in video games for customization purposes. Users can select various outfits, skins or colors for their Hero, Hero Red, and Hero Blue characters based on the identification of game classes.
In-Game Advertising and Sponsorships: The model can assist game developers and marketers in identifying specific game characters for dynamic in-game advertising or targeted sponsorships, by determining whether the character on screen is Hero, Hero Red, or Hero Blue.
eSports Analytics and Insights: Leverage the "game" model for real-time analytics and insights in the eSports industry by tracking each class of Hero on the screen during a live-streamed or recorded gaming session. This can help teams and coaches with monitoring character performance, gameplay strategies, and time management.
Accessibility Enhancements: Develop assistive technologies that utilize the model to narrate or describe scenes and characters to visually impaired gamers, by recognizing the Hero, Hero Red, and Hero Blue characters on screen during gameplay.
Content Filtering and Parental Controls: Implement content filtering and parental control mechanisms that can identify specific game classes and characters. Parents can use these features to filter or block games based on the presence of certain character classes like Hero, Hero Red, or Hero Blue to maintain age-appropriate gaming experiences.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Educational Tools: The RMAI model can be used to develop applications or tools to teach children or adults about letters and numbers. By scanning real-life objects or text, it can identify the mentioned classes and further enhance the learning experience.
Identification of License Plate Numbers: The model can be employed in surveillance software to identify vehicle license plates. Despite the model not being explicitly trained for this purpose, the ability to recognize the mentioned numeral and letter classes may be sufficient for basic applications.
Robot Navigation: The reference image suggests potential for robot navigation use. Robots could use this model to read numbers and letters in their environment, which could be used in synchronizing tasks or following specified routes in a warehouse or factory setting.
Accessibility Tools: The model can be used to develop applications for visually impaired people to read and comprehend written material. This can range from reading books, recognizing signs, or identifying different objects that have numbers or letters on them.
Data Sorting: In an office or warehouse setting, this model could be used to sort packages, files or items based on numbers and letters. This will help in increasing efficiency and reducing potential errors in the process.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
To train an Optical Character Recognition (OCR) model, a comprehensive dataset is essential. This dataset serves as the foundation for the model's learning process, enabling it to recognize and decipher various fonts, styles, and languages.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Improved CAPTCHA Technologies: Use the model to implement more sophisticated and advanced CAPTCHA for online platforms. It would help in distinguishing humans and bots, leading to better system security.
Data Entry Automation: Deploying this model can help automate data entry processes by decoding CAPTCHA-like handwritten or printed text to digital text.
Digital Archive Transcription: The model can be used to transcribe historical documents or books, which often feature letterings that resemble the jumbled nature of CAPTCHA images, into a digital format.
Improving Optical Character Recognition (OCR) Systems: This model could serve as a training tool for improving OCR systems in recognizing unconventional or distorted characters that often appear in CAPTCHA images.
Assisting Visually Impaired: Develop assistive technologies that could help visually impaired users navigate by converting printed text (that might be in CAPTCHA form due to print deformities) into speech or braille.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A simple dataset for benchmarking CreateML object detection models. The images are sampled from COCO dataset with eyes and nose bounding boxes added. It’s not meant to be serious or useful in a real application. The purpose is to look at how long it takes to train CreateML models with varying dataset and batch sizes.
Training performance is affected by model configuration, dataset size and batch configuration. Larger models and batches require more memory. I used CreateML object detection project to compare the performance.
Hardware
M1 Macbook Air * 8 GPU * 4/4 CPU * 16G memory * 512G SSD
M1 Max Macbook Pro * 24 GPU * 2/8 CPU * 32G memory * 2T SSD
Small Dataset Train: 144 Valid: 16 Test: 8
Results |batch | M1 ET | M1Max ET | peak mem G | |--------|:------|:---------|:-----------| |16 | 16 | 11 | 1.5 | |32 | 29 | 17 | 2.8 | |64 | 56 | 30 | 5.4 | |128 | 170 | 57 | 12 |
Larger Dataset Train: 301 Valid: 29 Test: 18
Results |batch | M1 ET | M1Max ET | peak mem G | |--------|:------|:---------|:-----------| |16 | 21 | 10 | 1.5 | |32 | 42 | 17 | 3.5 | |64 | 85 | 30 | 8.4 | |128 | 281 | 54 | 16.5 |
CreateML Settings
For all tests, training was set to Full Network. I closed CreateML between each run to make sure memory issues didn't cause a slow down. There is a bug with Monterey as of 11/2021 that leads to memory leak. I kept an eye on the memory usage. If it looked like there was a memory leak, I restarted MacOS.
Observations
In general, more GPU and memory with MBP reduces the training time. Having more memory lets you train with larger datasets. On M1 Macbook Air, the practical limit is 12G before memory pressure impacts performance. On M1 Max MBP, the practical limit is 26G before memory pressure impacts performance. To work around memory pressure, use smaller batch sizes.
On the larger dataset with batch size 128, the M1Max is 5x faster than Macbook Air. Keep in mind a real dataset should have thousands of samples like Coco or Pascal. Ideally, you want a dataset with 100K images for experimentation and millions for the real training. The new M1 Max Macbooks is a cost effective alternative to building a Windows/Linux workstation with RTX 3090 24G. For most of 2021, the price of RTX 3090 with 24G is around $3,000.00. That means an equivalent windows workstation would cost the same as the M1Max Macbook pro I used to run the benchmarks.
Full Network vs Transfer Learning
As of CreateML 3, training with full network doesn't fully utilize the GPU. I don't know why it works that way. You have to select transfer learning to fully use the GPU. The results of transfer learning with the larger dataset. In general, the training time is faster and loss is better.
batch | ET min | Train Acc | Val Acc | Test Acc | Top IU Train | Top IU Valid | Top IU Test | Peak mem G | loss |
---|---|---|---|---|---|---|---|---|---|
16 | 4 | 75 | 19 | 12 | 78 | 23 | 13 | 1.5 | 0.41 |
32 | 8 | 75 | 21 | 10 | 78 | 26 | 11 | 2.76 | 0.02 |
64 | 13 | 75 | 23 | 8 | 78 | 24 | 9 | 5.3 | 0.017 |
128 | 25 | 75 | 22 | 13 | 78 | 25 | 14 | 8.4 | 0.012 |
Github Project
The source code and full results are up on Github https://github.com/woolfel/createmlbench
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Title: Hollow Knight Object Detection for Reinforcement Learning Agent
Description:
This project focuses on developing an object detection model tailored to the popular game Hollow Knight. The goal is to detect and classify various in-game elements in real-time to create a dataset that powers a reinforcement learning (RL) agent. This agent will use the detected objects as inputs to interact with the game environment, make decisions, and achieve specific objectives such as defeating enemies, collecting items, and progressing through the game.
The object detection model will classify key elements in the game into the following 10 classes:
The object detection system will enable the RL agent to process and interpret the game environment, enabling intelligent decision-making.
Object Detection:
Develop a robust YOLO-based object detection model to identify and classify game elements from video frames.
Reinforcement Learning (RL):
Utilize the outputs of the object detection system (e.g., bounding boxes and class predictions) as the state inputs for an RL algorithm. The RL agent will learn to perform tasks such as:
Dynamic Adaptation:
Begin training the RL agent with a limited dataset of annotated images, gradually expanding the dataset to improve model performance and adaptability as more scenarios are introduced.
Automation:
The ultimate goal is to automate the gameplay of Hollow Knight, enabling the agent to mimic human-like decision-making.
Object Detection Training:
Use Roboflow for data preprocessing, annotation, augmentation, and model training. Generate a YOLO-compatible dataset and fine-tune the model for detecting the 10 classes.
Reinforcement Learning Agent:
Implement a deep RL algorithm (e.g., Deep Q-Networks (DQN) or Proximal Policy Optimization (PPO)).
Feedback Loop:
The RL agent's actions will be fed back into the game, generating new frames that the object detection model processes, creating a closed loop for training and evaluation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Object Tracking in Robotics: This model can be used in a robotic picking and sorting system. The robot, equipped with a camera, can identify box-letters-number-shapes and appropriately sort them.
Intelligent Traffic Management: The system can recognize traffic signs (e.g., "STOP", "bullseye", number signs) for real-time traffic monitoring and control. It would especially be useful in autonomous vehicle navigation.
Education and Learning: The model can be used in educational apps or tools for helping children learn about numbers, letters, and shapes, by identifying and naming them within images.
Manufacturing Quality Control: In a manufacturing settings, the model can help in identifying and sorting parts based on shapes, letters, and numbers, increasing efficiency and reducing error rate.
Interactive Gaming: In games, the model can be used to identify specified shapes, numbers, or obstacles, creating an interactive and immersive experience for the players.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Text Translation Application: For apps that are set to translate text from one language to another and are in need of identifying English definite and indefinite articles, operation keywords, names of places, institutes or even certain popular business terms, the "F_Text" computer vision model would be beneficial.
Education and Learning Tools: This model could be used to develop an educational application for kids, where they are trained to identify different classes of English words, symbols, and numbers. The model could pull out words from various contexts and ask the students to categorize them, enhancing their language skills.
Urban Navigation Applications: Applications meant to help people navigate through areas (like Essex or Colchester mentioned within the classes) could utilize this model to recognize those names in various context like in a road sign or a building name.
Access Control Systems: The model could be employed in a scenario where access to certain physical or digital spaces is outfitted with text-based barriers or captcha. Here, only a distinct sequence of these words or symbols would allow access.
Contextual Advertising: In the field of e-commerce or digital advertising, the model could be used to scan digital or physical text sources (like a book in the dataset), identify specific keywords or phrases related to a product or brand and trigger related advertisements. This would make advertising more contextual and personalized.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed to annotate the structural elements of academic papers. It aims to train models to recognize different parts of a paper. Each class corresponds to a text or graphical element commonly found in papers.
Text indicating the name(s) of the author(s), typically found near the beginning of a document.
Identify the text block containing the author names. It usually follows the title and may include affiliations. Do not include titles or titles of sections adjacent to author names.
Indicates a major division of the document, often labeled with a number and title.
Locate text labeled with "Chapter" followed by a number and title. Capture the entire heading, ensuring no unrelated text is included.
Symbols and numbers arranged to represent a mathematical concept.
Draw boxes around all mathematical expressions, excluding any accompanying text or numbers identifying the equations.
Numerals used to uniquely identify equations.
Identify numbers in parentheses next to equations. Do not include equation text or variables.
Visual content such as graphs, diagrams, or images.
Outline the entire graphical representation. Do not include captions or any surrounding text.
Text providing a description or explanation of a figure.
Identify the text directly associated with a figure below it. Ensure no unrelated figures or text are included.
Clarifications or additional details located at the bottom of a page.
Locate text at the page's bottom that refers back to a mark or reference in the main text. Exclude any unrelated content.
Headings at the start of a list, identifying its purpose or content.
Identify and label only the heading for lists in content sections. Do not include subsequent list items.
The detailed entries or points in a list.
Identify each item in a content list. Exclude list headings and any non-list content.
Numerical indication of the current page.
Locate numbers typically positioned at the top or bottom margins. Do not include text or symbols beside the numbers.
Blocks of text separated by spacing or indentation.
Enclose individual text blocks that form coherent sections. Ensure each paragraph is distinguished separately.
Bibliographic information found typically in a reference section.
Identify the full reference entries. Ensure each citation is clearly distinguished without over
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Clash Royale Character Detector dataset is used to train object detection models to identify and distinguish various characters and objects from the game Clash Royale. . These classes are separated into two categories: "Ally" and "Enemy," each with specific characters and objects. The annotation involves drawing bounding boxes around each character or object to train models for automatic detection. All the Allies wear blue whereas the Enemies wear red. The top half of the field belongs to the enemy, whereas the bottom belongs to the ally.
The classes in the dataset include:
A muscular, blonde, shirtless video game character wearing blue.
Draw bounding boxes around the ally barbarian.
A wooden battering ram with metal rings around each end wearing blue.
Draw bounding boxes around the battle ram.
A small unit wi
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Automated Accessibility: Improve accessibility for visually impaired or disabled users who struggle with solving Captchas by providing alternative methods for user verification, such as audio or text-based questions.
CAPTCHA Improvement Research: Study the effectiveness of the current Captcha classes and develop more secure and user-friendly Captcha systems to prevent malicious automated bots from bypassing the security.
Educational Tool: Teach machine learning and computer vision students and enthusiasts how to train and test models in recognizing and classifying Captcha characters, with the emphasis on proper use and ethical considerations.
Data Entry Quality Assurance: Incorporate the "Break Captcha Life" model into data entry platforms to automatically verify Captcha solutions entered by human users, ensuring they have correctly solved the Captcha before submitting the form.
Security Stress Testing: Test the robustness of various online platforms' Captcha systems by using the "Break Captcha Life" model to evaluate their susceptibility to automated attacks while adhering to responsible disclosure and ethical hacking practices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Video Game Development: Developers can use the "orange_bots" model to create more immersive and interactive games, especially for games that include 'bots' as part of their character roster. The model could help developers more easily create non-player characters (NPCs) and categorize them properly.
eSports Analysis: This model could be used to study and analyze gameplay in eSports, particularly in recognizing bot strategies and player interactions with bots. This data could then be used to improve game design, player training, or competitive strategies.
Content Moderation: For platforms hosting user-generated gaming content, the model can help identify the portions of the game that include bots. This can assist in moderating content, ensuring the fair play principles are adhered to, and identifying any bot-related cheating.
User-Generated Content Curation: The model can be used as a tool for curating user-generated content, like videos or streamed content featuring gameplay. By recognizing bots, videos could be correctly labeled and categorized for easier discovery.
Interactive Entertainment: This model could be employed in theme parks or virtual reality experiences for user interaction with bots. As users engage with virtual bots, their behaviors and responses can be analyzed to enhance the user experience.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed to annotate the structural elements of academic papers. It aims to train models to recognize different parts of a paper. Each class corresponds to a text or graphical element commonly found in papers.
Text indicating the name(s) of the author(s), typically found near the beginning of a document.
Identify the text block containing the author names. It usually follows the title and may include affiliations. Do not include titles, affiliations or titles of sections adjacent to author names.
Indicates a major division of the document, often labeled with a number and title.
Locate text labeled with "Chapter" followed by a number and title. Capture the entire heading, ensuring no unrelated text is included.
Symbols and numbers arranged to represent a mathematical concept.
Draw boxes around all mathematical expressions, excluding any accompanying text or numbers identifying the equations.
Numerals used to uniquely identify equations.
Identify numbers in parentheses next to equations. Do not include equation text or variables.
Visual content such as graphs, diagrams, code or images.
Outline the entire graphical representation. Do not include captions or any surrounding text.
Text providing a description or explanation above or below a figure.
Identify the text directly associated with a figure. Ensure no unrelated figures or text are included.
Clarifications or additional details located at the bottom of a page.
Locate text at the page's bottom that refers back to a mark or reference in the main text. Exclude any unrelated content.
Headings at the list of context text, identifying its purpose or content. This may also be called a list of figures.
Identify and label only the heading for lists in content sections. Do not include subsequent list items.
The detailed entries or points in a list. These often summarize all figures in the paper.
Identify each item in a content list. Exclude list headings and any non-list content.
Numerical indication of the current page.
Locate numbers typically positioned at the top or bottom margins. Do not include text or symbols beside the numbers.
Blocks of text separated by spacing or indentation.
Enclose individual text blocks that form coherent sections. Ensure each paragraph is distinguished separately.
Bibliographic information found typically in a reference sect
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project labels container codes on trucks. It can be used with optical character recognition (OCR) software to identify vehicles entering and exiting facilities or passing a checkpoint via a security camera feed or traffic cam.
The project includes several exported versions, and fine-tuned models that can be used in the cloud or on an edge device.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Retail Store Analytics: The "Emergent Object" model can be used in observing consumer behavior in retail environments by distinguishing between a human shopper and manikins. This could help glean insights into shopping patterns, time spent near certain displays, or it can be used to enhance security measures.
Virtual Reality: This model could be particularly useful in VR simulations. By being able to distinguish between human players and simulated characters (manikins), it can create a more immersive and interactive experience.
Film and Television Production: The model could be used in the production phase to differentiate between actors and manikins or props, helping in tracking shots, scene comprehension, and subsequent CGI implementation.
Advanced Driver Assistance Systems: The model can be used in vehicular technologies to identify pedestrians crossing the street. This can enhance the accuracy of driver assistance systems and contribute to safer driving.
Manikin-Based Training: In medical or emergency training scenarios where manikins are used to mimic real-life situations, the model can differentiate between learners/participants and manikins, helping in the evaluation of the training session.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Taiwan License Plate Character Recognition Research" project focuses on identifying characters primarily based on Taiwan license plate fonts, coupled with license plate detection technology. Through our simple yet practical code, users can assemble a full license plate number according to the X-coordinate of the characters. The aim of this project is to optimize the license plate recognition process, enabling a faster, more accurate capture of license plate numbers.
Generate by GPT4 Here are a few use cases for this project:
Automated Parking System: Utilize the "taiwan-license-plate-char-recognition-research" model to read and recognize license plates in parking lots, allowing for streamlined and automated entry/exit management and billing.
Traffic Surveillance and Enforcement: Integrate the model into traffic monitoring systems to identify traffic violators, such as speeding or running red lights, by capturing and recognizing their license plates, and assist law enforcement in issuing fines or citations.
Stolen Vehicle Detection: Leverage the model within police and security systems to identify stolen or flagged vehicles by matching their license plates in real-time with a database of reported stolen or wanted vehicles.
Intelligent Transportation System: Incorporate the model into smart city infrastructure for monitoring and predicting traffic flow, analyzing road conditions, and managing traffic signals, based on real-time vehicle count and license-plate identification.
Access Control and Security: Implement the model in gated communities, corporate campuses, or sensitive facilities to provide automated access control to authorized vehicles, enhancing security and convenience for residents, employees, and visitors.
Additional Explanation: The images in this project come from multiple different authors' projects. Prior to the creation of this dataset, we performed the following steps on the images:
If you have other questions or want to discuss this data set, you can contact: https://t.me/jtx257
台灣車牌字元識別研究專案主要聚焦於識別基於台灣車牌字體的字元,結合車牌檢測技術。通過我們簡潔實用的程式碼,用戶可以根據字元的X坐標組合出完整的車牌號碼。此項目旨在優化車牌識別過程,使其更快速、準確地捕捉車牌號碼。
由GPT4生成 以下是此項目的幾個**應用案例**:
自動停車系統:利用“台灣車牌字元識別研究”模型,在停車場讀取和識別車牌,從而實現出入口管理和計費的自動化。
交通監控與執法:將模型整合到交通監控系統中,識別違反交通規則的行為,如超速或闖紅燈,通過捕捉並識別其車牌,協助執法部門開出罰單或傳票。
被盜車輛檢測:在警方和安全系統中利用該模型,通過與報告中被盜或通緝車輛的數據庫即時匹配其車牌,識別被盜或被標記的車輛。
智能交通系統:將模型納入智慧城市基礎設施,基於實時車輛計數和車牌識別,用於監測和預測交通流量,分析道路條件,並管理交通信號。
出入控制與安全:在封閉社區、企業園區或敏感設施中實施該模型,為授權車輛提供自動出入控制,提升居民、員工和訪客的安全性和便利性。
額外說明: 該專案的圖片來自多個不同作者的專案。在製作這個資料集之前,我們已經對照片進行了以下幾個步驟:
如果對此資料集有其他疑問或想討論的,可聯繫: https://t.me/jtx257
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Yolo Find Text is a dataset for object detection tasks - it contains Text annotations for 290 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).