2 datasets found

R
Sanskrit Ocr Dataset
universe.roboflow.com
zip
Updated Jul 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iitbresearchwork (2023). Sanskrit Ocr Dataset [Dataset]. https://universe.roboflow.com/iitbresearchwork/sanskrit-ocr/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jul 17, 2023
Dataset authored and provided by
iitbresearchwork
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Sanskrit Words Bounding Boxes
Description
Here are a few use cases for this project:

Digital Archiving: "Sanskrit OCR" can be used by libraries, universities or history enthusiasts to digitize ancient Sanskrit texts or manuscripts. This would enable easier preservation, access, and understanding of these invaluable pieces of human history and culture.

Education and Research: This model can be used by students and researchers studying Sanskrit literature, linguistics, or history. Users can quickly search for specific terms in digitized texts, aiding their studies significantly.

Translation Services: If linked with translation software, "Sanskrit OCR" can be extremely helpful in providing instant translations of Sanskrit texts into various languages. This could be useful in applications like translating ancient documents or creating bilingual editions of Sanskrit literature.

Enhancing Accessibility for Blind individuals: This model could be integrated into audiobook applications to convert scanned Sanskrit texts into spoken words, making this ancient literature more accessible to people with visual impairments.

Artificial Intelligence Training: Developers could make use of "Sanskrit OCR" model to train other AI models for tasks involving recognition and understanding of classical languages or scripts, advancing the linguistics field within AI.
F
Hindi Wake Words & Voice Commands Speech Data
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Hindi Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-hindi-india
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Hindi Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.
Speech Data
This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:
•Wake words alone
•Wake words followed by command phrases
Participant Diversity
•
Speakers: 50 native Hindi speakers from the FutureBeeAI community

•
Regions: Participants from various India provinces, ensuring broad coverage of accents and dialects

•
Demographics: Ages 18–70; 60% male and 40% female participants

Recording Details
•
Type: Scripted wake words and command phrases

•
Duration: 1 to 15 seconds per clip

•
Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

Dataset Diversity
•Wake Word Types
•
Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.

•
Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.

•
Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more

Command Types by Use Case
•
Automobile: Play music, check directions, voice search, provide feedback, and more

•
Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more

•
Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.

Recording Environments
•No background noise
•Background traffic noise
•People talking in the background
Speaking Pace
•Normal speed
•Fast speed
This diversity ensures robust training for real-world voice assistant applications.
Metadata
Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.
•
Participant Metadata: Unique ID, age, gender, region, accent, dialect

•
Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

Use Cases & Applications
•
Voice Assistant Activation: Train models to accurately detect and trigger based on wake words

•
Smart Home Devices: Enable responsive voice control in smart appliances

•
Automotive Voice Control: Power voice-based commands for navigation, entertainment, and system control
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

iitbresearchwork (2023). Sanskrit Ocr Dataset [Dataset]. https://universe.roboflow.com/iitbresearchwork/sanskrit-ocr/dataset/1

Sanskrit Ocr Dataset

sanskrit-ocr

sanskrit-ocr-dataset

Explore at:

zipAvailable download formats

Dataset updated

Jul 17, 2023

Dataset authored and provided by

iitbresearchwork

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Sanskrit Words Bounding Boxes

Description

Here are a few use cases for this project:

Digital Archiving: "Sanskrit OCR" can be used by libraries, universities or history enthusiasts to digitize ancient Sanskrit texts or manuscripts. This would enable easier preservation, access, and understanding of these invaluable pieces of human history and culture.
Education and Research: This model can be used by students and researchers studying Sanskrit literature, linguistics, or history. Users can quickly search for specific terms in digitized texts, aiding their studies significantly.
Translation Services: If linked with translation software, "Sanskrit OCR" can be extremely helpful in providing instant translations of Sanskrit texts into various languages. This could be useful in applications like translating ancient documents or creating bilingual editions of Sanskrit literature.
Enhancing Accessibility for Blind individuals: This model could be integrated into audiobook applications to convert scanned Sanskrit texts into spoken words, making this ancient literature more accessible to people with visual impairments.
Artificial Intelligence Training: Developers could make use of "Sanskrit OCR" model to train other AI models for tasks involving recognition and understanding of classical languages or scripts, advancing the linguistics field within AI.

Clear search

Close search

Google apps

Main menu

Sanskrit Ocr Dataset

Hindi Wake Words & Voice Commands Speech Data

Introduction

Speech Data

Participant Diversity

Recording Details

Dataset Diversity

Command Types by Use Case

Recording Environments

Speaking Pace

Metadata

Use Cases & Applications

Sanskrit Ocr Dataset

sanskrit-ocr

sanskrit-ocr-dataset