3 datasets found
  1. E

    Bangor University's Fine Tuned WhisperCpp Model for Verbatim Welsh Language...

    • live.european-language-grid.eu
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technologies Unit (2024). Bangor University's Fine Tuned WhisperCpp Model for Verbatim Welsh Language Spontaneous Speech Recognition [Dataset]. https://live.european-language-grid.eu/catalogue/ld/23819
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset authored and provided by
    Language Technologies Unit
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Bangor
    Description

    This model is a version of the openai/whisper-base model, fine-tuned with transcriptions of Welsh language spontaneous speech from Banc Trawsgrifiadau Bangor (btb) dataset, as well as read speech from Welsh Common Voice version 18 (cv) for additional training, and then converted for use in whisper.cpp.

    Whispercpp is a C/C++ port of Whisper that provides high performance inference on hardware such as desktops, laptops and mobile devices, thus giving an offline option.

    The model is a smaller in size to the corresponding model for hosting on cloud GPU based infrastructure techiaith/whisper-large-v3-ft-btb-cv-cy and thus not as accurate.

    It achieves the following WER results for transcribing Welsh language spontaneous speech:

    • WER: 62.76
    • CER: 27.70

  2. A

    AI-Based Transcription and Captioning Services Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). AI-Based Transcription and Captioning Services Report [Dataset]. https://www.marketreportanalytics.com/reports/ai-based-transcription-and-captioning-services-52532
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The AI-based transcription and captioning services market is experiencing robust growth, driven by the increasing demand for accessible content and the rising adoption of AI-powered solutions across diverse sectors. The market, currently estimated at $2 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 20% from 2025 to 2033, reaching an estimated market value of $8 billion by 2033. This significant expansion is fueled by several key factors. The media and entertainment industry is a major driver, leveraging AI for faster and more accurate transcription of audio and video content, improving accessibility and workflow efficiency. Online education and training platforms are also significant adopters, utilizing AI captioning to cater to diverse learning styles and enhance accessibility for students. The rise of remote work and virtual meetings further boosts demand, necessitating real-time captioning and transcription services for seamless communication. Technological advancements, including improvements in natural language processing (NLP) and speech recognition accuracy, are key enablers of this market growth. The market is segmented by application (Media & Entertainment, Online Education & Training, Meetings & Conferences, Others) and type (Cloud-Based, On-Premises), with the cloud-based segment expected to dominate due to its scalability, cost-effectiveness, and accessibility. Despite the positive outlook, market growth faces certain challenges. Data privacy concerns and the need for robust security measures to protect sensitive information remain significant hurdles. The accuracy of AI-based transcription, especially in noisy environments or with diverse accents, still requires improvement. Furthermore, the high initial investment costs associated with implementing AI-powered solutions can pose a barrier for some businesses, particularly smaller organizations. However, ongoing advancements in AI technology and the increasing affordability of solutions are expected to mitigate these restraints over time. The competitive landscape is dynamic, with established players like IBM and newer entrants like OpenAI Whisper vying for market share. The success of companies in this space hinges on their ability to deliver high-accuracy transcriptions, efficient user interfaces, and robust security features while adapting to the evolving needs of diverse industries.

  3. h

    Live-WhisperX-526K

    • huggingface.co
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joya Chen (2025). Live-WhisperX-526K [Dataset]. https://huggingface.co/datasets/chenjoya/Live-WhisperX-526K
    Explore at:
    Dataset updated
    Apr 23, 2025
    Authors
    Joya Chen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Live-WhisperX-526K

      Uses
    

    This dataset is used for the training of the LiveCC-7B-Instruct model. We only allow the use of this dataset for academic research and educational purposes. For OpenAI GPT-4o generated user prompts, we recommend users check the OpenAI Usage Policy.

    Project Page: https://showlab.github.io/livecc Paper: https://huggingface.co/papers/2504.16030

      Data Sources
    

    After we finished the pre-training of LiveCC-7B-Base model… See the full description on the dataset page: https://huggingface.co/datasets/chenjoya/Live-WhisperX-526K.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Language Technologies Unit (2024). Bangor University's Fine Tuned WhisperCpp Model for Verbatim Welsh Language Spontaneous Speech Recognition [Dataset]. https://live.european-language-grid.eu/catalogue/ld/23819

Bangor University's Fine Tuned WhisperCpp Model for Verbatim Welsh Language Spontaneous Speech Recognition

Explore at:
Dataset updated
Oct 31, 2024
Dataset authored and provided by
Language Technologies Unit
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Area covered
Bangor
Description

This model is a version of the openai/whisper-base model, fine-tuned with transcriptions of Welsh language spontaneous speech from Banc Trawsgrifiadau Bangor (btb) dataset, as well as read speech from Welsh Common Voice version 18 (cv) for additional training, and then converted for use in whisper.cpp.

Whispercpp is a C/C++ port of Whisper that provides high performance inference on hardware such as desktops, laptops and mobile devices, thus giving an offline option.

The model is a smaller in size to the corresponding model for hosting on cloud GPU based infrastructure techiaith/whisper-large-v3-ft-btb-cv-cy and thus not as accurate.

It achieves the following WER results for transcribing Welsh language spontaneous speech:

  • WER: 62.76
  • CER: 27.70

Search
Clear search
Close search
Google apps
Main menu